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Applicants: Francois Capman et al. § Group Art Unit: 

§ 

IntT Appl. No. : PCT/FROO/0 1 909 

IntT Filing Date: 4 July 2000 § 

§ 

For: Encoding and Decoding with § Atty. Dkt. No.: MTR.0029US 

Harmonic Components and § 
Minimum Phase § 

Box PCT 

Commissioner for Patents 
Washington DC 20231 

PRELIMINARY AMENDMENT 

Sir: 

Prior to Examination, please amend the above-identified application as follows 

In the Specification: 

Page 1, at line 3, please insert the following paragraph: 
-BACKGROUND OF THE INVENTION— 

Page 2, at line 11, please insert the following paragraph: 
-SUMMARY OF THE INVENTION— 

Page 3, delete lines 24-27. 

Page 3, at line 28, please insert the following paragraph: 
-BRIEF DESCRIPTION OF THE DRAWINGS-- 

Page 5, at line 4, insert the following paragraph: 
-DETAILED DESCRIPTION- 

In the Abstract; 

-A fundamental frequency of the audio signal is determined, as well as a spectrum of 
the audio signal, by a transform into the frequency domain of a frame of the audio signal. 
Cepstral coefficients area calculated by transforming a compressed upper envelope of the 
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spectrum into a cepstral domain. The digital output stream of the audio coder includes data 
for coding a harmonic component of the audio signal, which data include data representative 
of spectral amplitudes associated with frequencies multiple of the fundamental frequency, 
obtained from the calculated cepstral coefficients and, for some of the frequencies multiple of 
the estimated fundamental frequency, data relating to a phase of the spectrum of the audio 
signal in a neighborhood of this multiple frequency. -- 

In the Claims: 

Amend the following claim: 

1 . (Amended) A method of coding an audio signal, comprising the steps of: 
estimating a fundamental frequency of the audio signal; 

determining a spectrum of the audio signal through a transform into the frequency 
domain of a frame of the audio signal; 

determining a compressed upper envelope of the spectrum of the audio signal; 

calculating cepstral coefficients by transforming the compressed upper envelope into 
a cepstral domain; 

obtaining data representative of spectral amplitudes associated with frequencies 
multiple of the fundamental frequency from the calculated cepstral coefficients; and 

including data for coding a harmonic component of the audio signal in a digital output 

stream, 

wherein the data for coding the harmonic component comprise said data 
representative of spectral amplitudes and, for at least one of the frequencies multiple of the 
estimated fundamental frequency, data relating to a phase of the spectrum of the audio signal 
in a neighborhood of said multiple frequency. 

2. (Amended) The method as claimed in claim 1, wherein the step of determining 
the compressed upper envelope comprises interpolating the spectral amplitudes associated 
with the frequencies multiple of the fundamental frequency with application of a spectral 
compression function. 

3. (Amended) The method as claimed in claim 1, further comprising the steps 

of: 

transforming the cepstral coefficients by liftering and smoothing in the cepstral 
domain; and 



calculating minimum phases of the audio signal at frequencies multiple of the 
fundamental frequency on the basis of the transformed cepstral coefficients. 



4. (Amended) The method as claimed in claim 3, further comprising the steps 

of: 

quantizing a phase discrepancy between the spectrum of the audio signal at at least 
one frequency multiple of the fundamental frequency and the corresponding calculated 
minimum phase; and 

including quantization parameters for said phase discrepancy in the data for coding 
the harmonic component. 

5. (Amended) The method as claimed in claim 3, wherein said littering is 
adapted to minimize a phase discrepancy between the spectrum of the audio signal at at least 

C one frequency multiple of the fundamental frequency and the corresponding calculated 

H 1 minimum phase, the method further comprising the step of: 

[n including parameters representative of the adapted liftering in the data for coding the 

,2 harmonic component. 

I 6. (Amended) The method as claimed in claim 5, further comprising the step of: 

%-A selecting at least one frequency multiple of the fundamental frequency on the basis of 

*p a modulus magnitude of the spectrum in absolute value, 

Q wherein the minimized phase discrepancy for the adaptation of the liftering relates to 

fit 

the at least one selected frequency. 

7. (Amended) The method as claimed in claim 5, further comprising the steps 

of: 

estimating a curve of spectral masking of the audio signal by means of a psycho- 
acoustic model; 

selecting at least one frequency multiple of the fundamental frequency on the basis of 
a modulus magnitude of the spectrum in relation to the masking curve, 

wherein the minimized phase discrepancy for the adaptation of the liftering relates to 
the at least one selected frequency. 
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8. (Amended) The method as claimed in claim 3, further comprising the steps 

of: 

examining variations of the cepstral coefficients; and 

applying a smoothing of lesser magnitude in the presence of abrupt variations than in 
the presence of slow variations of the cepstral coefficients. 

9. (Amended) The method as claimed in claim 8, further comprising the steps 

of: 

selecting a forget factor based on a comparison between a threshold and a distance 
between two successive sets of cepstral coefficients; and 

smoothing the cepstral coefficients by means of the selected forget factor. 

10. (Amended) The method as claimed in claim 9, wherein said threshold is 
adapted on the basis of variations of the cepstral coefficients. 

11. (Amended) The method as claimed in claim 3, wherein the liftering is of the 
form c p (i) = [l+Y2-Yi]-c(i) - (p-Vi), where c p (i) and c(i) designate the cepstral coefficient of 
order i respectively before and after liftering, yi and y 2 are coefficients lying between 0 and 1 
and u is a pre-emphasizing coefficient. 

12. (Amended) The method as claimed in claim 1 1 , wherein (j, = (y 2 - Yi).c(l). 

13. (Amended) The method as claimed in claim 1, further comprising the steps 

of: 

quantizing a phase variation of the spectrum of the audio signal over at least one 
frequency interval surrounding a frequency multiple of the fundamental frequency; and 

including quantization parameters for said phase variation in the data for coding the 
harmonic component. 

14. (Amended) The method as claimed in claim 13, further comprising the step 

of: 

representing the quantized phase variation by a slope of the phase of the spectrum at 
said frequency multiple of the fundamental frequency. 
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15. (Amended) The method as claimed in claim 13, wherein the quantized phase 
variation represents a phase model stored in a dictionary determined by learning. 

16. (Amended) An audio coder, comprising: 

means for estimating a fundamental frequency of an audio signal; 
means for determining a spectrum of the audio signal through a transform into the 
frequency domain of a frame of the audio signal; 

means for determining a compressed upper envelope of the spectrum of the audio 

signal; 

means for calculating cepstral coefficients by transforming the compressed upper 
envelope into a cepstral domain; 

means for obtaining data representative of spectral amplitudes associated with 
frequencies multiple of the fundamental frequency from the calculated cepstral coefficients; 
and 

means for including data for coding a harmonic component of the audio signal in a 
digital output stream, 

wherein the data for coding the harmonic component comprise said data 
representative of spectral amplitudes and, for at least one of the frequencies multiple of the 
estimated fundamental frequency, data relating to a phase of the spectrum of the audio signal 
in a neighborhood of said multiple frequency. 

17. (Amended) The audio coder as claimed in claim 16, wherein the means for 
determining the compressed upper envelope comprise means for interpolating the spectral 
amplitudes associated with the frequencies multiple of the fundamental frequency with 
application of a spectral compression function. 

1 8. (Amended) The audio coder as claimed in claim 1 6, further comprising: 
means for transforming the cepstral coefficients by liftering and smoothing in the 

cepstral domain; and 

means for calculating minimum phases of the audio signal at frequencies multiple of 
the fundamental frequency on the basis of the transformed cepstral coefficients. 

19. (Amended) The audio coder as claimed in claim 18, further comprising: 
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means for quantizing a phase discrepancy between the spectrum of the audio signal at 
at least one frequency multiple of the fundamental frequency and the corresponding 
calculated minimum phase; and 

means for including quantization parameters for said phase discrepancy in the data for 
coding the harmonic component. 

20. (Amended) The audio coder as claimed in claim 18, wherein said liftering is 
adapted to minimize a phase discrepancy between the spectrum of the audio signal at at least 
one frequency multiple of the fundamental frequency and the corresponding calculated 
minimum phase, the audio coder further comprising: 

means for including parameters representative of the adapted liftering in the data for 
coding the harmonic component. 

21 . (Amended) The audio coder as claimed in claim 20, further comprising: 
means for selecting at least one frequency multiple of the fundamental frequency on 

the basis of a modulus magnitude of the spectrum in absolute value, 

wherein the minimized phase discrepancy for the adaptation of the liftering relates to 
the at least one selected frequency. 

22. (Amended) The audio coder as claimed in claim 20, further comprising: 
means for estimating a curve of spectral masking of the audio signal by means of a 

psycho-acoustic model; 

means for selecting at least one frequency multiple of the fundamental frequency on 
the basis of a modulus magnitude of the spectrum in relation to the masking curve, 

wherein the minimized phase discrepancy for the adaptation of the liftering relates to 
the at least one selected frequency. 

23. (Amended) The audio coder as claimed in claim 1 8, further comprising: 
means for examining variations of the cepstral coefficients; and 

means for applying a smoothing of lesser magnitude in the presence of abrupt 
variations than in the presence of slow variations of the cepstral coefficients. 
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24. (Amended) The audio coder as claimed in claim 23, further comprising: 
means for selecting a forget factor based on a comparison between a threshold and a 

distance between two successive sets of cepstral coefficients; and 

means for smoothing the cepstral coefficients by means of the selected forget factor. 

25. (Amended) The audio coder as claimed in claim 24, wherein said threshold is 
adapted on the basis of variations of the cepstral coefficients. 

26. (Amended) The audio coder as claimed in claim 18, wherein the liftering is of 
the form c p (i) = [l+Y2-y\].c(i) - (uVi), where c p (i) and c(i) designate the cepstral coefficient of 
order i respectively before and after liftering, yi and y 2 are coefficients lying between 0 and 1 
and ]u is a pre-emphasizing coefficient. 

27. (Amended) The audio coder as claimed in claim 26, wherein p = (y 2 - Yi)-c(l). 

28. (Amended) The audio coder as claimed in claim 16, further comprising: 
means for quantizing a phase variation of the spectrum of the audio signal over at 

least one frequency interval surrounding a frequency multiple of the fundamental frequency; 
and 

means for including quantization parameters for said phase variation in the data for 
coding the harmonic component. 

29. (Amended) The audio coder as claimed in claim 28, further comprising: 
means for representing the quantized phase variation by a slope of the phase of the 

spectrum at said frequency multiple of the fundamental frequency. 

Please add the following new claims: 

30. (New) The audio coder as claimed in claim 28, wherein the quantized phase 
variation represents a phase model stored in a dictionary determined by learning. 

31. (New) A method of decoding a digital input stream representing a coded 
audio signal, comprising the steps of: 
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receiving, in the digital input stream, data for coding a fundamental frequency of the 
audio signal and data for coding a harmonic component comprising quantization data for 
cepstral coefficients representing an upper spectral envelope of the audio signal; 

determining a modulus of a spectral estimate of the harmonic component from said 
upper spectral envelope at frequencies multiple of the fundamental frequency; 

determining a phase of said spectral estimate of the harmonic component; and 

generating the spectral estimate of the harmonic component having the determined 
modulus and phase, 

wherein the step of determining the phase of the spectral estimate of the harmonic 
component comprises: 

transforming the cepstral coefficients by liftering and smoothing in a cepstral domain; 

and 

calculating minimum phases of the audio signal at frequencies multiple of the 
fundamental frequency on the basis of the transformed cepstral coefficients. 

32. (New) The method as claimed in claim 31, wherein the step of determining 
the phase of the spectral estimate of the harmonic component further comprises: 

modifying at least one calculated minimum phase on the basis of phase discrepancy 
quantization data included in the data for coding the harmonic component. 

33. (New) The method as claimed in claim 32, wherein the step of determining 
the phase of the spectral estimate of the harmonic component comprises: 

determining a variation of the phase of the spectral estimate of the harmonic 
component over at least one frequency interval surrounding a frequency multiple of the 
fundamental frequency based on said phase discrepancy quantization data. 

34. (New) The method as claimed in claim 33, wherein the phase variation 
determined over an frequency interval is a linear variation represented by a slope quantization 
parameter. 

35. (New) The method as claimed in claim 33, wherein the phase variation 
determined over a frequency interval represents a phase model stored in a dictionary 
determined by learning. 
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36. (New) The method as claimed in claim 31, wherein said liftering is adapted 
on the basis of parameters included in the data for coding the harmonic component. 

37. (New) The method as claimed in claim 31, further comprising the steps of: 
examining variations of the cepstral coefficients; and 

applying a smoothing of lesser magnitude in the presence of abrupt variations than in 
the presence of slow variations of the cepstral coefficients. 

38. (New) The method as claimed in claim 37, further comprising the steps of: 
selecting a forget factor based on a comparison between a threshold and a distance 

between two successive sets of cepstral coefficients; and 

smoothing the cepstral coefficients by means of the selected forget factor. 

39. (New) The method as claimed in claim 38, wherein said threshold is adapted 
on the basis of variations of the cepstral coefficients. 

40. (New) An audio decoder, comprising: 

means for receiving a digital input stream representing a coded audio signal, 
means for extracting, from the digital input stream, data for coding a fundamental 
frequency of the audio signal and data for coding a harmonic component comprising 
quantization data for cepstral coefficients representing an upper spectral envelope of the 
audio signal; 

means for determining a modulus of a spectral estimate of the harmonic component 
from said upper spectral envelope at frequencies multiple of the fundamental frequency; 

means for determining a phase of said spectral estimate of the harmonic component; 

and 

means for generating the spectral estimate of the harmonic component having the 
determined modulus and phase, 

wherein the means for determining the phase of the spectral estimate of the harmonic 
component comprise: 

means for transforming the cepstral coefficients by liftering and smoothing in a 
cepstral domain; and 

means for calculating minimum phases of the audio signal at frequencies multiple of 
the fundamental frequency on the basis of the transformed cepstral coefficients. 
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41. (New) The audio decoder as claimed in claim 40, wherein the means for 
determining the phase of the spectral estimate of the harmonic component further comprise: 

means for modifying at least one calculated minimum phase on the basis of phase 
discrepancy quantization data included in the data for coding the harmonic component. 

42. (New) The audio decoder as claimed in claim 41, wherein the means for 
determining the phase of the spectral estimate of the harmonic component comprises: 

means for determining a variation of the phase of the spectral estimate of the 
harmonic component over at least one frequency interval surrounding a frequency multiple of 
the fundamental frequency based on said phase discrepancy quantization data. 

43. (New) The audio decoder as claimed in claim 42, wherein the phase variation 
determined over an frequency interval is a linear variation represented by a slope quantization 
parameter. 

44. (New) The audio decoder as claimed in claim 42, wherein the phase variation 
determined over a frequency interval represents a phase model stored in a dictionary 
determined by learning. 

45. (New) The audio decoder as claimed in claim 40, wherein said liftering is 
adapted on the basis of parameters included in the data for coding the harmonic component. 

46. (New) The audio decoder as claimed in claim 40, further comprising: 
means for examining variations of the cepstral coefficients; and 

means for applying a smoothing of lesser magnitude in the presence of abrupt 
variations than in the presence of slow variations of the cepstral coefficients. 

47. (New) The audio decoder as claimed in claim 46, further comprising: 
means for selecting a forget factor based on a comparison between a threshold and a 

distance between two successive sets of cepstral coefficients; and 

means for smoothing the cepstral coefficients by means of the selected forget factor. 
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48. (New) The audio decoder as claimed in claim 47, wherein said threshold is 
adapted on the basis of variations of the cepstral coefficients. 

49. A method of decoding a digital input stream representing a coded audio signal, 
comprising the steps of: 

receiving, in the digital input stream, data for coding a fundamental frequency of the 
audio signal and data for coding a harmonic component comprising quantization data for 
cepstral coefficients representing an upper spectral envelope of the audio signal; 

determining a modulus of a spectral estimate of the harmonic component from said 
upper spectral envelope at frequencies multiple of the fundamental frequency; 

determining a phase of said spectral estimate of the harmonic component; and 

generating the spectral estimate of the harmonic component having the determined 
modulus and phase, 

wherein the step of determining the phase of the spectral estimate of the harmonic 
component comprises: 

determining a phase variation of said spectral estimate over at least one frequency 
interval surrounding a frequency multiple of the fundamental frequency on the basis of 
quantization parameters included in the data for coding the harmonic component. 

50. (New) The method as claimed in claim 49, wherein the phase variation 
determined over an frequency interval is a linear variation represented by a slope quantization 
parameter. 

51. (New) The method as claimed in claim 49, wherein the phase variation 
determined over a frequency interval represents a phase model stored in a dictionary 
determined by learning. 

52. (New) An audio decoder, comprising: 

means for receiving a digital input stream representing a coded audio signal, 
means for extracting, from the digital input stream, data for coding a fundamental 
frequency of the audio signal and data for coding a harmonic component comprising 
quantization data for cepstral coefficients representing an upper spectral envelope of the 
audio signal; 
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means for determining a modulus of a spectral estimate of the harmonic component 
from said upper spectral envelope at frequencies multiple of the fundamental frequency; 

means for determining a phase of said spectral estimate of the harmonic component; 

and 

means for generating the spectral estimate of the harmonic component having the 
determined modulus and phase, 

wherein the means for determining the phase of the spectral estimate of the harmonic 
component comprise: 

means for determining a phase variation of said spectral estimate over at least one 
frequency interval surrounding a frequency multiple of the fundamental frequency on the 
basis of quantization parameters included in the data for coding the harmonic component. 

53. (New) The audio decoder as claimed in claim 52, wherein the phase variation 
determined over an frequency interval is a linear variation represented by a slope quantization 
parameter. 

54. (New) The audio decoder as claimed in claim 52, wherein the phase variation 
determined over a frequency interval represents a phase model stored in a dictionary 
determined by learning. 

Remarks: 

Allowance of all claims is respectfully requested. The Commissioner is authorized to 
charge any additional fees under 37 C.F.R. § 1.16 and § 1.17, or credit any overpayment to 
Deposit Account No. 20-1504 (MTR.0029US). 



Respectfully submitted, 





TROP, PRUNER & HU, P.C. 
8554 Katy Freeway, Suite 100 
Houston, Texas 77024-1805 
(713) 468-8880 [Phone] 



(713) 468-8883 [Fax] 
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VERSIONS WITH MARKINGS TO SHOW CHANGES 

IN THE CLAIMS : 

Amendments of the claims are indicated below: 

1 . (Amended) A method of coding an audio signal, comprising the steps of: 

[in which] estimating a fundamental frequency [(F 0 )] of the audio signal; 

[signal is estimated,] determining a spectrum of the audio signal [is determined] 
through a transform into the frequency domain of a frame of the audio signal ; 

determining a compressed upper envelope of the spectrum of the audio signal; 

calculating cepstral coefficients by transforming the compressed upper envelope into 
a cepstral domain; 

obtaining data representative of spectral amplitudes associated with frequencies 
multiple of the fundamental frequency [(Fo) are obtained by means of cepstral coefficients 
y . calculated by transforming in the cepstral domain a compressed upper envelope (LX_sup) of 

P the spectrum of the audio signal, and in that] from the calculated cepstral coefficients; and 

including data for coding a harmonic component of the audio signal in a digital output 

W stream, 

fjj wherein the data for coding the harmonic component comprise said data 

representative of spectral amplitudes and, for at least one of the frequencies multiple of the 

m estimated fundamental frequency, data relating to a phase of the spectrum of the audio signal 

in a neighborhood of said multiple frequency. 

| 

y 2. (Amended) The method as claimed in claim 1, [in which] wherein the step of 

determining the compressed upper envelope [(LX_sup) is determined by interpolation of 
said] comprises interpolating the spectral amplitudes associated with the frequencies [which 
are multiples] multiple of the fundamental frequency [(F 0 )] with application of a spectral 
compression function. 

3. (Amended) The method as claimed in claim 1, further comprising the steps 
of: [1 or 2, in which] 

transforming the cepstral coefficients [are transformed] by liftering and smoothing in 
the cepstral domain; and 



[domain, and] calculating minimum phases [q>(k)] of the audio signal at frequencies 
[which are multiples] multiple of the fundamental frequency [(Fo) are calculated] on the basis 
of the transformed cepstral coefficients [cxl[n])]. 



4. (Amended) The method as claimed in claim 3, further comprising the steps 

of 

[in which] quantizing a phase discrepancy between the spectrum of the audio signal at 
at least one frequency [which is a] multiple of the fundamental frequency [(Fo)] and the 
corresponding calculated minimum phase: and [phase ((p(k)) is quantized, and] 

including quantization parameters [(iAcp)]for said phase discrepancy [are included] in 
the data for coding the harmonic component. 

. 5. (Amended) The method as claimed in claim 3, [in which] wherein said 

p littering is adapted [in such a way as] to minimize a phase discrepancy between the spectrum 

*f of the audio signal at at least one frequency [which is a] multiple of the fundamental 

y?l frequency [(Fo)] and the corresponding calculated minimum phase, the method further 

? 2 comprising the step of: 

k§ [phase (<p(k)), and] including parameters [(iLif)] representative of the adapted liftering 

p [are included] in the data for coding the harmonic component. 

*p 6. (Amended) The method as claimed in claim 5, [in which] further comprising 

the step of: 

ftl 

[the minimized phase discrepancy for the adaptation of the liftering relates to] 
selecting at least one frequency [which is a] multiple of the fundamental frequency [(Fo), 
selected] on the basis of [the] a modulus magnitude [of the modulus] of the spectrum in 
absolute [value] value, wherein the minimized phase discrepancy for the adaptation of the 
liftering relates to the at least one selected frequency . 

7. The method as claimed in claim 5, further comprising the steps of: 
[in which] estimating a curve of spectral masking of the audio signal is estimated by 
means of a psycho-acoustic [model, and] model: 

[the minimized phase discrepancy for the adaptation of the liftering relates to] selecting at 
least one frequency [which is a] multiple of the fundamental frequency [(F 0 ), selected] on the 



basis of [the] a modulus magnitude [of the modulus] of the spectrum in relation to the 
masking curve, wherein the minimized phase discrepancy for the adaptation of the liftering 
relates to the at least one selected frequency. 

8. (Amended) The method as claimed in claim 3, further comprising the steps 
of: [any one of claims 3 to 7, in which the] 

examining variations of the cepstral coefficients; and 

[coefficients (cxp[n]) are examined so as to apply] applying a smoothing of lesser 
magnitude in the presence of abrupt variations than in the presence of slow variations of the 
cepstral coefficients . 

9. (Amended) The method as claimed in claim 8, [in which the smoothing of] 
further comprising the steps of: 

[the cepstral coefficients performed by means of] selecting a forget factor [(X c chosen 
as a function of)] based on a comparison between a threshold [](dth) and a distance [(d)] 
between two successive sets of cepstral [coefficients (cxp[n-l, cxp[n])] coefficients: and 
smoothing the cepstral coefficients by means of the selected forget factor . 

10. (Amended) The method as claimed in claim 9, [wherein] wherein said 
threshold [(d th )] is adapted [on the basis of] on the basis of variations of the cepstral 
coefficients [(cxp[n])]. 

1 1 . (Amended) The method as claimed in [in any one of claims 3 to 10, in which] 
claim 3. wherein the liftering is of the form c p (i) = [1 + y 2 l - Yi']- C (i) - (M-'A), where c p (i) and 
c(i) designate the cepstral coefficient of order i respectively before and after liftering, [and] yi 
and Y2 are coefficients lying between 0 and 1 and (a. is a pre-emphasizing coefficient. 

12. (Amended) The method as claimed in claim 11, [in which] wherein \x = (y 2 - 
Yi)-c(l). 

13. (Amended) The method as claimed in claim 1, further comprising the steps 
of: [any one of claims 1 to 12, in which a variation of the phase] 



quantizing a phase variation of the spectrum of the audio signal [is quantized] over at 
least one frequency interval [of frequencies] surrounding a frequency [which is a] multiple of 
the fundamental frequency; and 




[frequency (F 0 ), and] including quantization parameters [(iA(p)] for said phase 
variation [are included] in the data for coding the harmonic component. 

14. The method as claimed in claim 13, further comprising the step of: 

[in which] representing the quantized phase variation [is represented] by [the] a slope 
of the phase of the spectrum at [the] said frequency [which is a] multiple of the fundamental 
frequency [(F 0 )] 

15. (Amended) The method as claimed in claim 13, [in which] wherein the 
quantized phase variation represents a phase model stored in a dictionary determined by 
learning. 



16. (Amended) An audio coder, [comprising means for executing a method 
according to any one of claims 1 to 1 5] comprising: 

means for estimating a fundamental frequency of an audio signal: 

means for determining a spectrum of the audio signal through a transform into the 



means for determining a compressed upper envelope of the spectrum of the audio 

signal; 

means for calculating cepstral coefficients by transforming the compressed upper 
envelope into a cepstral domain; 

means for obtaining data representative of spectral amplitudes associated with 
frequencies multiple of the fundamental frequency from the calculated cepstral coefficients; 
and 

means for including data for coding a harmonic component of the audio signal in a 
digital output stream, 

wherein the data for coding the harmonic component comprise said data 
representative of spectral amplitudes and, for at least one of the frequencies multiple of the 
estimated fundamental frequency, data relating to a phase of the spectrum of the audio signal 
in a neighborhood of said multiple frequency. 




frequency domain of a frame of the audio signal; 



17. (Amended) The audio coder as claimed in claim 16, wherein the means for 
determining the compressed upper envelope comprise means for interpolating the spectral 
amplitudes associated with the frequencies multiple of the fundamental frequency with 
application of a spectral compression function [A method of decoding a digital input stream 
(<D) representing a coded audio signal, in which data for coding a fundamental frequency (Fo) 
of the audio signal and data for coding a harmonic component comprising quantization data 
for cepstral coefficients representing an upper spectral envelope (X_sup) of the audio signal 
are received in the digital input stream, and a spectral estimate (X v ) of the harmonic 
component is generated, its modulus being determined by said upper spectral envelope at the 
frequencies which are multiples of the fundamental frequency, characterized in that, to 
determine the phase (cp v ) of the spectral estimate of the harmonic component (X v ), the 
cepstral coefficients are transformed by liftering and smoothing in the cepstral domain, and 
minimum phases (cp(k)) of the audio signal at frequencies which are multiples of the 
fundamental frequency are calculated on the basis of the transformed cepstral coefficients 
(cxl[n])]. 

18. (Amended) [The method as claimed in claim 17, in which, to obtain the phase 
(<p v ) of the spectral estimate (X v ) of the harmonic component, at least one calculated 
minimum phase ((p(k)) is modified on the basis of phase discrepancy quantization data (iA(p) 
included in the data for coding the harmonic component] The audio coder as claimed in claim 
16, further comprising: 

means for transforming the cepstral coefficients by liftering and smoothing in the 
cepstral domain; and 

means for calculating minimum phases of the audio signal at frequencies multiple of 
the fundamental frequency on the basis of the transformed cepstral coefficients . 

19. (Amended) [The method as claimed in claim 18, in which the phase 
discrepancy quantization data (iAcp) are used to determine a variation of the phase (cp v ) of the 
spectral estimate (X v ) of the harmonic component over at least one interval of frequencies 
surrounding a frequency which is a multiple of the fundamental frequency (F 0 )] The audio 
coder as claimed in claim 1 8, further comprising: 



means for quantizing a phase discrepancy between the spectrum of the audio signal at 
at least one frequency multiple of the fundamental frequency and the corresponding 
calculated minimum phase; and 

means for including quantization parameters for said phase discrepancy in the data for 
coding the harmonic component . 

20. (Amended) [The method as claimed in claim 19, in which the phase variation 
determined over an interval of frequencies is a linear variation represented by a slope 
quantization parameter] The audio coder as claimed in claim 18, wherein said liftering is 
adapted to minimize a phase discrepancy between the spectrum of the audio signal at at least 
one frequency multiple of the fundamental frequency and the corresponding calculated 
minimum phase, the audio coder further comprising: 

means for including parameters representative of the adapted liftering in the data for 
coding the harmonic component . 

21. (Amended) [The method as claimed in claim 19, in which the phase variation 
determined over an interval of frequencies represents a phase model stored in a dictionary 
determined by learning] The audio coder as claimed in claim 20, further comprising: 

means for selecting at least one frequency multiple of the fundamental frequency on 
the basis of a modulus magnitude of the spectrum in absolute value, 

wherein the minimized phase discrepancy for the adaptation of the liftering relates to 
the at least one selected frequency . 

22. (Amended) [The method as claimed in any one of claims 17 to 21, in which 
said liftering is adapted as a function of parameters (iLif) included in the data for coding the 
harmonic component] The audio coder as claimed in claim 20. further comprising: 

means for estimating a curve of spectral masking of the audio signal by means of a 
psycho-acoustic model; 

means for selecting at least one frequency multiple of the fundamental frequency on 
the basis of a modulus magnitude of the spectrum in relation to the masking curve, 

wherein the minimized phase discrepancy for the adaptation of the liftering relates to 
the at least one selected frequency . 



23. (Amended) [The method as claimed in any one of claims 17 to 22, in which 
the variations of the cepstral coefficients (cxp[n]) are examined so as to apply a smoothing of 
lesser magnitude in the presence of abrupt variations than in the presence of slow variations] 
The audio coder as claimed in claim 18, further comprising: 

means for examining variations of the cepstral coefficients; and 
means for applying a smoothing of lesser magnitude in the presence of abrupt 
variations than in the presence of slow variations of the cepstral coefficients . 

24. (Amended) [The method as claimed in claim 23, in which the smoothing of 
the cepstral coefficients (cxp[n]) is performed by means of a forget factor (A. c ) chosen as a 
function of a comparison between a threshold (d th ) and a distance (d) between two successive 
sets of cepstral coefficients (cxp[n-l], cxp[n])] The audio coder as claimed in claim 23, 
further comprising: 

means for selecting a forget factor based on a comparison between a threshold and a 
distance between two successive sets of cepstral coefficients; and 

means for smoothing the cepstral coefficients by means of the selected forget factor . 

25. (Amended) [The method as claimed in claim 24, in which said threshold (dth) 
is adapted as a function of the variations of the cepstral coefficients (cxp[n])] The audio coder 
as claimed in claim 24. wherein said threshold is adapted on the basis of variations of the 
cepstral coefficients . 

26. (Amended) [A method of decoding a digital input stream (O) representing a 
coded audio signal, in which data for coding a fundamental frequency (Fo) of the audio signal 
and data for coding a harmonic component comprising quantization data for cepstral 
coefficients representing an upper spectral envelope (X_sup) of the audio signal are received 
in the digital input stream, and a spectral estimate (X v ) of the harmonic component is 
generated, its modulus being determined by said upper spectral envelope at the frequencies 
which are multiples of the fundamental frequency, characterized in that a phase variation ((p v ) 
of the spectral estimate of the harmonic component is determined over at least one interval of 
frequencies surrounding a frequency which is a multiple of the fundamental frequency on the 
basis of quantization parameters (iAcp) included in the data for coding the harmonic 
component] The audio coder as claimed in claim 18, wherein the liftering is of the form 



c c (i") = |"l-Hy2 '-Y l1- c ffl - ("M.'/i"). where c c fi) and cffl designate the cepstral coefficient of order i 
respectively before and after liftering, yi and y? are coefficients lying between 0 and 1 and |u, 
is a pre-emphasizing coefficient . 

27. (Amended) [The method as claimed in claim 26, in which the phase variation 
determined over an interval of frequencies is a linear variation represented by a slope 
quantization parameter] The audio coder as claimed in claim 26, wherein = (y? - y i ).c(l) . 

28. (Amended) [The method as claimed in claim 26, in which the phase variation 
determined over an interval of frequencies represents a phase model stored in a dictionary 
determined by learning] The audio coder as claimed in claim 16, further comprising: 

means for quantizing a phase variation of the spectrum of the audio signal over at 
least one frequency interval surrounding a frequency multiple of the fundamental frequency; 
and 

means for including quantization parameters for said phase variation in the data for 
coding the harmonic component . 

29. (Amended) [An audio decoder, comprising means for executing a method 
according to any one of claims 17 to 28] The audio coder as claimed in claim 28, further 
comprising: 

means for representing the quantized phase variation by a slope of the phase of the 
spectrum at said frequency multiple of the fundamental frequency . 



(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(l?) World Intellectual Property Organization WIPO 
International Bureau 

(43) International publication date PCT (10) lnternational P ub,icat5on number 

11 January 2001 (11.01.2001) WO 01/03121 Al 



(51) 
(21) 
(22) 
(25) 
(26) 
(30) 
U 

u 
c 
h* 
m 
H 
— m- 



International patent classification : G10L 19/02 

International application number: PCT/FROO/01909 
International filing date: 4 July 2000 (04.07.2000) 



Language of Tiling: 
Language of publication: 



Data relating to the priority: 

99/08,633 5 July 1999 (05.07.1999) 



French 
French 



(71) Applicant {for all designated States except US): 
MATRA NORTEL COMMUNICATIONS [FR/FR]; 
50, rue du President Sadate, F-29100 Quimper (FR). 

(72) Inventors; and 

(75) Inventors/Applicants (US only): CAPMAN, Francois 
[FR/FR]; 47, rue des Etats-Generaux, F-78000 
Versailles (FR). MURGIA, Carlo [IT/US]; 458 Saint 
Vincent, Irvine, CA 92618 (US). 



pj^ (74) Representatives: LOISEL, Bertrand etc.; Cabinet 
Plasseraud, 84, rue d'Amsterdam, F-75440 Paris 
Cedex 09 (FR). 

(81) Designated states (national): AE, AG, AL, AM, AT, 
AU, AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CR, 

^ {continued on next page] 



As printed 

(54) Title: ENCODING AND DECODING WITH HARMONI C COMPONENTS AND MINIMUM PHASE 



| s4 (54) Titre: CODAGE ET DECODAGE AUDIO AVEC COMPOS ANTS HARM ONIQUES ET PHASE MINIM ALE 



jcx_sup 



1 POST— LIFTRAGE 

j — T~ 

NORMALISATION 



r 

r 



DUANTiFiCATION 



V-T8 



ADAPTATION 
POST- LIFT RAGE 



— 29 



POST. LIFT RAGE 



lcxp[n] 



r 



.140 





CX_SUp_ 


9 






CAL 


CUL 




AMPLH 


UDES 




SPECTRALES 











cxl[n] 


142 




EXTRACTION 

.PHASE 
MINIMALE 


Yx 
1 










148— 


QUANTIFICATION 
ECART DE PHASE 



15,140... POST-LIFTERING 
18... QUANTIZATION 

28... SPECTRAL AMPLITUDE CALCULATION 
29... POST-LIFTERING ADAPTATION 
141... SMOOTHING 
142... MINIMUM PHASE RETRIEVAL 
148... PHASE DIFFERENCE QUANTIZATION 



(57) Abstract: The invention 
concerns a method wherein the 
encoder estimates a fundamental 
frequency (F 0 ) of an audio signal, 
determines a spectrum of the 
audio signal by a transform in the 
frequency domain of a frame of 
the audio signal, and includes in 
the digital flow transmitted to the 
decoder data encoding a harmonic 
component of the audio signal, 
comprising data representing 
spectral amplitudes associated 
with multiple frequencies of the 
estimated fundamental frequency. 
The data encoding the harmonic 
component further comprise, 
for at least one of the multiple 
frequencies of the estimated 
fundamental frequency, data 
(iA<j>) concerning the phase of 
the audio signal spectrum in the 
neighbourhood of said multiple 
frequency. 
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AUDIO CODING AND DECODING WITH HARMONIC COMPONENTS AND 
MINIMUM PHASE 

The present invention relates to the field of the 
coding of audio signals. It applies in particular, but 
not exclusively, to the coding of speech, in narrowband 
or in broadband, in various coding bit rate ranges. 

The design of an audio codec is aimed chiefly at 
providing a good compromise between the bit rate of the 
stream transmitted by the coder and the quality of the 
audio signal which the decoder is capable of 
reconstructing from this stream. 

With this in mind, families of coders have in 
particular been developed which are based on analyzing 
the audio signal in the spectral domain: the coder 
estimates a fundamental frequency of the signal, 
representing its pitch, and the spectral analysis 
consists in determining parameters representing the 
harmonic structure of the signal at the frequencies 
which are integer multiples of this fundamental 
frequency. Modeling of the nonharmonic, or unvoiced, 
component may also be performed in the spectral domain. 
The parameters transmitted to the decoder typically 
represent the modulus of the spectrum of the voiced and 
unvoiced components. Added thereto is information 
representing either voiced/unvoiced decisions relating 
to various portions of the spectrum, or information 
regarding the probability of voicing of the signal, 
allowing the decoder to determine those portions of the 
spectrum in which it must use the voiced component or 
the unvoiced component. 

These families of coders comprise the coders of the MBE 
type (standing for "Multi-Band Excitation"), or else 
the coders of the ' STC type (standing for "Sinusoidal 
Transform Coder") . By way of reference, mention may be 
made of US patents A 856 068, 4 885 790, 4 937 873, 
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5 054 072, 5 081 681, 5 195 166, 5 216 747, 5 226 084, 

5 226 108, 5 247 579, 5 473 727, 5 517 511, 5 630 Oil, 

5 630 012, 5 649 050, 5 651 093, 5 664 051, 5 664 052, 

5 684 926, 5 701 390, 5 715 365, 5 749 065, 5 752 222, 

5 5 765 127, 5 774 837 and 5 890 108. 

An aim of the present invention is to make it possible, 
in a coding scheme with analysis in the spectral 
domain, to improve the modeling of the phases of the 
10 spectrum of the signal by the decoder. 

The invention thus proposes a method of coding an audio 
signal, in which a fundamental freguency of the audio 
signal is estimated, a spectrum of the audio signal is 

15 determined through a transform in the frequency domain 
of a frame of the audio signal, and data for coding a 
harmonic component of the audio signal, comprising data 
representative of spectral amplitudes associated with 
frequencies which are multiples of the estimated 

20 fundamental frequency, are included in a digital output 
stream. According to the invention said data 
representative of spectral amplitudes associated with 
frequencies which are multiples of the fundamental 
frequency are obtained by means of cepstral 

25 coefficients calculated by transforming in the cepstral 
domain a compressed upper envelope of the spectrum of 
the audio signal, and the data for coding the harmonic 
component furthermore comprise, for at least one of the 
frequencies which are multiples of the estimated 

30 fundamental frequency, data relating to the phase of 
the spectrum of the audio signal in the neighborhood of 
said multiple frequency. 

Another aspect of the present invention relates to a 
35 method of decoding a digital input stream representing 
a coded audio signal, in which data for coding a 
fundamental freguency of the audio signal and data for 
coding a harmonic component comprising quantization 
data for cepstral coefficients representing an upper 
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spectral envelope of the audio signal are received in 
the digital input stream, and a spectral estimate of 
the harmonic component is generated, its modulus being 
determined by said upper spectral envelope at the 
5 frequencies which are multiples of the fundamental 
frequency. According to the invention, to determine the 
phase of the spectral estimate of the harmonic 
component, the cepstral coefficients are transformed by 
liftering and smoothing in the cepstral domain, and 
10 minimum phases of the audio signal at frequencies which 
are multiples of the fundamental frequency are 
calculated on the basis of the transformed cepstral 
coefficients. According to the invention, it is also 

N* possible to determine a phase variation of the spectral 

15 estimate of the harmonic component over at least one 

h& interval of frequencies surrounding a frequency which 

is a multiple of the fundamental frequency on the basis 

ff| of quantization parameters included in the data for 

coding the harmonic component. 

b 20 

|»A The invention also proposes an audio coder and decoder 

% % comprising means for implementing the above methods. 

C 

fU Other features and advantages of the present invention 

25 will become apparent in the description below of non- 
limiting exemplary embodiments, with reference to the 
appended drawings, in which: 

figure JL_is a schematic diagram of an audio coder 
30 according to the invention; 

figures 2 _ and 3 are charts illustrating the 
formation of the audio signal frames in the coder 
of figure 1; 

figures .4 and^5_ are graphs showing an exemplary 
35 spectrum of the audio signal and illustrating the 

extraction of the upper and lower envelopes of 
this spectrum; 
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figure 6 is a schematic diagram of an example of 
quantization means usable in the coder of 
figure 1; 

figure 7 is a schematic diagram of means usable to 
extract parameters relating to the phase of the 
non-harmonic component in a variant of the coder 
of figure 1; 

figure 8 is a schematic diagram of an audio 
decoder corresponding to the coder of figure 1; 
figure 9 is a flowchart of an exemplary procedure 
for siRSo thing spectral coefficients and for 
extracting minimum phases implemented in the 
decoder of figure 8; 

figure 10 is a schematic diagram of modules for 
analysis and for spectral mixing of harmonic and 
non-harmonic components of the audio signal; 
figures 11 to 13 are graphs showing examples of 
nonlinear functions usable in the analysis module 
of figure 10; 

figures 14 andl5 are charts illustrating a way of 
carrying out the temporal synthesis of the signal 
frames in the decoder of figure 8; 

figures 16 and 17 are graphs showing windowing 
functions usable in the synthesis of the frames 
according to figures 14 and 15; 

figures 18 and 19 are schematic diagrams of 
interpola^tToTr-m^arrs' usable in a variant embodiment 
of the coder and of the decoder; 

figure 20 is a schematic diagram of interpolation 
means usable in another variant embodiment of the 
coder; 

figures 21 and 22 are charts illustrating another 
way of carrying out the temporal synthesis of the 
signal frames in the decoder of figure 8, with the 
aid of an interpolation of parameters; 
figures^23 to 25 are schematic diagrams of variant 
means of post-processing the cepstral coefficients 
representing the upper envelope of the spectrum of 
the signal in the coder of figure 1; and 



WO 01/03121 



- 5 - 



PCT/FR00/01909 



figure 26 is a partial schematic diagram of a 
decoder associated with a coder according to 
figure 25. 

5 The coder and decoder described hereinbelow are digital 
circuits which can, as is customary in the field of 
audio signal processing, be embodied by programming a 
digital signal processor (DSP) or an application 
specific integrated circuit (ASIC) . 

10 

The audio coder represented in figure 1 processes an 
audio input signal x which, in the nonlimiting example 
considered hereinbelow, is a speech signal. The signal 
x is available in digital form, for example at a 

15 sampling frequency F e of 8 kHz. It is, for example, 
delivered by an analog/digital converter processing the 
amplified output signal from a microphone. The input 
signal x can also be formed from another version, 
analog or digital, coded or uncoded, of the speech 

20 signal. 

The coder comprises a module 1 which forms successive 
frames of audio signal for the various processing 
operations performed, and an output multiplexer 6 which 
25 delivers an output stream O containing, for each frame, 
sets of quantization parameters from which a decoder 
will be capable of synthesizing a decoded version of 
the audio signal. 

30 The structure of the frames is illustrated by figures 2 
and 3. Each frame 2 is composed of a number N of 
consecutive samples of the audio signal x. The 
successive frames exhibit mutual time shifts 
corresponding to M samples, so that their overlap is 

35 L = N-M samples of the signal. In the example 
considered, where N = 256, M = 160 and L = 96, the 
duration of the frames 2 is N/F e = 32 ms , and a frame 
is formed every M/F e = 20 ms . 
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In a conventional manner, the module 1 multiplies the 
samples of each frame 2 by a windowing function f A , 
preferably chosen for its good spectral properties. The 
samples x(i) of the frame being digitized from i = 0 to 
5 i = N-l, the analysis window fA(i) can thus be a 
Hamming window, expressed by: 

f A (i) = 0.54 + 0.46. cosjVn 1 ~ (N ~ 1} 7 2 j (1) 
or a Hanning window, expressed by: 

10 or else a Kaiser window, expressed by: 




where a is a coefficient equal, for example, to 6, and 
Io(-) designates the Bessel function of index 0. 

15 The coder of figure 1 carries out an analysis of the 
audio signal in the spectral domain. It comprises a 
module 3 which calculates the fast Fourier transform 
(FFT) of each signal frame. The signal frame is shaped 
before being subjected to the FFT module 3: the module 

20 1 appends N = 256 zero samples thereto so as to obtain 
the maximum resolution of the Fourier transform, and it 
moreover performs a circular permutation of the 
2N = 512 samples so as to compensate for the phase 
effects resulting from the analysis window. This 

25 modification of the frame is illustrated by figure 3. 
The frame whose fast Fourier transform is calculated on 
2N = 512 points commences with the last N/2 = 128 
weighted samples of the frame, followed by the N = 256 
zero samples, and terminates with the first N/2 = 128 

30 weighted samples of the frame. 

The FFT module 3 obtains the spectrum of the signal for 
each frame, whose modulus and phase are respectively 
denoted |X| and cp x , or |X(i)| and cp x (i) for the 
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frequency indices i = 0 to i = 2N-1 (by virtue of the 
symmetry of the Fourier transform and of the frames, we 
may confine ourselves to the values for 0 < i < N) . 

5 A fundamental-frequency detector 4 estimates for each 
signal frame a value of the fundamental frequency F 0 . 
The detector 4 can apply any known procedure for 
analyzing the speech signal of the frame to estimate 
the fundamental frequency F 0/ for example a procedure 

10 based on the autocorrelation function or the AMDF 
function, possibly preceded by a module for whitening 
by linear prediction. The estimate can also be made in 
the spectral domain or in the cepstral domain. Another 
possibility is to evaluate the time intervals between 

15 the consecutive breaks in the speech signal which are 
attributable to closures of the talker's glottis 
occurring over the duration of the frame. Well-known 
procedures which can be used to detect such microbreaks 
are described in the following articles: M. Basseville 

20 et al., "Sequential detection of abrupt changes in 
spectral characteristics of digital signals" (IEEE 
Trans, on Information Theory, 1983, Vol. IT-29, No. 5, 
pages 708-723); R. Andre-Obrecht , "A new statistical 
approach for the automatic segmentation of continuous 

25 speech signals" (IEEE Trans, on Acous . , Speech and Sig. 
Proc, Vol. 36, No. 1, January 1988); and C. MURGIA et 
al . , "An algorithm for the estimation of glottal 
closure instants using the sequential detection of 
abrupt changes in speech signals" (Signal Processing 

30 VII, 1994, pages 1685-1688). 

The estimated fundamental frequency F 0 forms the 
subject of a quantization, for example scalar, by a 
module 5, which provides the output multiplexer 6 with 
35 an index iF of quantization of the fundamental 
frequency for each frame of the signal. 

The coder uses cepstral parametric modelings to 
represent an upper envelope and a lower envelope of the 
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spectrum of the audio signal. The first step of the 
cepstral transformation consists in applying a spectral 
compression function to the modulus of the spectrum of 
the signal, which function may be a logarithmic or root 
5 function. The module 8 of the coder thus carries out, 
for each value X(i) of the spectrum of the signal 
(0 < i < N) , the following transformation: 



in the case of a root compression, y being an exponent 
lying between 0 and 1. 

The compressed spectrum LX of the audio signal is 
15 processed by a module 9 which extracts spectral 
amplitudes associated with the harmonics of the signal 
corresponding to the multiples of the estimated 
fundamental freguency F0 . These amplitudes are then 
interpolated by a module 10 so as to obtain a 
20 compressed upper envelope denoted LX_sup . 

It should be noted that the spectral compression could 
eguivalently be performed after determining the 
amplitudes associated with the harmonics. It could also 
25 be performed after interpolation, and this would merely 
modify the form of the interpolation functions. 

The module 9 for extracting the maxima takes account of 
any variation in the fundamental freguency over the 

30 analysis frame, errors which the detector 4 may make, 
as well as inaccuracies related to the discrete nature 
of the freguency sampling. To do this, the search for 
the amplitudes of the spectral peaks does not consist 
simply in taking the values LX(i) corresponding to the 

35 indices i such that i . F e /2N is the frequency closest to 
a harmonic of frequency k . F 0 ( k > 1). The spectral 
amplitude retained for a harmonic of order k is a local 
maximum of the modulus of the spectrum in the 
neighborhood of the frequency k . F 0 (this amplitude is 



10 



LX(i) = Log{ |X(i) | ) 
in the case of a logarithmic compression or 
LX(i) = |X(i) I 7 



(4) 



(5) 
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obtained directly in compressed form when the spectral 
compression 8 is performed before the extraction of the 
maxima 9 ) . 

5 Figures 4 and 5 show an exemplary form of the 
compressed spectrum LX, where it may be seen that the 
maximum amplitudes of the harmonic peaks do not 
necessarily coincide with the amplitudes corresponding 
to the integer multiples of the estimated fundamental 

10 frequency F 0 . Since the sides of the peaks are fairly 
steep, a small error in the positioning of the 
fundamental frequency F 0 , amplified by the harmonic 
index k, may greatly distort the estimated upper 
envelope of the spectrum and cause poor modeling of the 

15 formant structure of the signal. For example, directly 
taking the spectral amplitude for the frequency 3 . F 0 in 
the case of figures 4 and 5 would produce a sizeable 
error in the extraction of the upper envelope in the 
neighborhood of the harmonic of order k = 3, although, 

20 in the example drawn, this relates to a zone of 
sizeable energy. By performing the interpolation on the 
basis of the actual maximum, this kind of error in 
estimating the upper envelope is avoided. 

25 In the example represented in figure 4, the 
interpolation is performed between points whose 
abscissa is the frequency corresponding to the maximum 
of the amplitude of a spectral peak, and whose ordinate 
is this maximum, before or after compression. 

30 

The interpolation performed to calculate the upper 
envelope LX_sup is a simple linear interpolation. Of 
course, some other form of interpolation could be used 
(for example polynomial or spline) . 

35 

In the preferred variant represented in figure 5, the 
interpolation is performed between points whose 
abscissa is a frequency k.F 0 which is a multiple of the 
fundamental frequency (in fact the closest frequency in 
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the discrete spectrum) and whose ordinate is the 
maximum amplitude, before or after compression, of the 
spectrum in the neighborhood of this multiple 
frequency . 



By comparing figures 4 and 5, it may be seen that the 
mode of extraction according to figure 5, which 
repositions the peaks on the harmonic frequencies, 
leads to better accuracy with regard to the amplitude 

10 of the peaks which will be attributed by the decoder to 
the frequencies which are multiples of the fundamental 
frequency. A slight frequency displacement may occur in 
the position of these peaks, this not being very 
significant perceptually and anyway not being avoided 

15 either in the case of figure 4. In the case of figure 
4, the anchoring points for the interpolation are one 
and the same as the vertices of the harmonic peaks. In 
the case of figure 5, these anchoring points must lie 
precisely at the frequencies which are multiples of the 

20 fundamental frequency, their amplitudes corresponding 
to those of the peaks. 

The search interval for the amplitude maximum 
associated with a harmonic of rank k is centered on the 
25 index i of the frequency of the FFT closest to k.F 0 , 



equal to or immediately less than the number a. The 
width of this search interval depends on the sampling 
frequency F e , on the size 2N of the FFT and on the 

30 possible range of variation of the fundamental 
frequency. This width is typically of the order of some 
ten frequencies with the exemplary values considered 
earlier. It may be rendered adjustable as a function of 
the value F 0 of the fundamental frequency and of the 

35 number k of the harmonic. 



5 




where |_aj designates the integer 
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In order to improve the resolution in the low 
frequencies and hence to more faithfully represent the 
amplitudes of the harmonics in this zone, a nonlinear 
distortion of the frequency scale is carried out on the 
5 compressed upper envelope by a module 12 before the 
module 13 performs the inverse fast Fourier transform 
(IFFT) providing the cepstral coefficients cx_sup. 

The nonlinear distortion allows more efficient 

10 minimization of the modeling error. It is, for example, 

performed on a frequency scale of Mel or Bark type. 

This distortion may possibly depend on the estimated 

fundamental frequency F 0 . Figure 1 illustrates the case 

of the Mel scale. The relation between the frequencies 

15 F of the linear spectrum, expressed in hertz, and the 

frequencies F' of the Mel scale is as follows: 

10 °0 f, F ^ 

F' = x log 10 1 + (6) 

log 10 (2) y "\ lOOoJ ' 

In order to limit the transmission bit rate, a 
20 truncation of the cepstral coefficients cx_sup is 
performed. The IFFT module 13 need only calculate a 
cepstral vector of NCS cepstral coefficients of orders 
0 to NCS-1. By way of example, NCS may be equal to 16. 

25 Post-filtering in the cepstral domain, referred to as 
post-lif tering, is applied by a module 15 to the 
compressed upper envelope LX_sup. This post-lif tering 
corresponds to a manipulation of the cepstral 
coefficients cx_sup delivered by the IFFT module 13, 

30 which corresponds approximately to a post-filtering of 
the harmonic part of the signal by a transfer function 
having the conventional form: 



where A(z) is the transfer function of a filter for 
35 linear prediction of the audio signal, y 1 and y 2 are 
coefficients lying between 0 and 1, and u is a pre- 
emphasizing coefficient, possibly zero. The relation 
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between the post-lif tered coefficient of order i, 
denoted c p (i) , and the corresponding cepstral 
coefficient c(i) = cx_sup(i) delivered by the module 13 
is then: 

c p (0) = c(0) 
c p (i) = [1 + y\ - y^(i) - tj- for i > 0 

The optional pre-emphasizing coefficient |a. may be 
controlled by setting as constraint the preserving of 
the value of the cepstral coefficient cx_sup(l) 
10 relating to the slope. Specifically, the value of 
c(l) = cx_sup(l) of white noise filtered by the pre- 
emphasizing filter corresponds to the pre-emphasizing 
coefficient. The latter may thus be chosen as follows: 
M- = (Y2-Y1) -c (1) . 

15 

After the post-lifter 15, a normalizing module 16 again 
modifies the cepstral coefficients by imposing the 
constraint of exact modeling of a point of the initial 
spectrum, which is preferably the point of greatest 
20 energy from among the spectral maxima extracted by the 
module 9. In practice, this normalization modifies only 
the value of the coefficient c p { 0 ) . 

The normalizing module 16 operates as follows: it 
25 recalculates a value of the synthesized spectrum at the 
frequency of the maximum indicated by the module 9, by 
Fourier transform of the truncated and post-lif tered 
cepstral coefficients, taking into account the 
nonlinear distortion of the frequency axis; it 
30 determines a normalizing gain g N through the 
logarithmic difference between the value of the maximum 
as delivered by the module 9 and this value 
recalculated; and it adds the gain g N to the post- 
liftered cepstral coefficient c p ( 0 ) . This normalization 
35 may be viewed as being part of the post-li f tering . 
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The post-lif tered and normalized cepstral coefficients 
form the subject of a quantization by a module 18 which 
transmits corresponding quantization indices icxs to 
the output multiplexer 6 of the coder. 

5 

The module 18 can operate by vector quantization on the 
basis of cepstral vectors formed of post-lif tered and 
normalized coefficients, here denoted cx[n] for the 
signal frame of rank n. By way of example, the cepstral 

10 vector cx[n] of NCS = 16 cepstral coefficients cx[n,0], 
cx[n,l], cx[n, NCS-1] is distributed as four 

cepstral subvectors each containing four coefficients 
of consecutive orders. The cepstral vector cx[n] can be 
processed by the means represented in figure 6, forming 

15 part of the quantization module 18. These means 
implement, for each component cx[n,i], a predictor of 
the form: 

cx p [n,i] = (l-a(i) ) .rcx[n,i] + a (i) . rex [n-1, i] (9) 
where rcx[n] designates a residual prediction vector 

20 for the frame of rank n whose components are 
respectively denoted rcx[n,0], rcx[n,l], 
rex [n, NCS-1] , and oc(i) designates a prediction 
coefficient chosen so as to be representative of an 
assumed inter-frame correlation. After quantization of 

25 the residuals, this residual vector is defined by: 

cx[n, i] - a(i) .rex q[n - 1, i] 

rcx[n, i] = = (10) 

2 - a(i) 

where rcx_q[n-l] designates the quantized residual 
vector for the frame of rank n-1, whose components are 
respectively denoted rcx_q[n,0], rcx_q[n,l], 
30 rcx_q[n, NCS-1] . 

The numerator of relation (10) is obtained by a 
subtractor 20, whose output vector components are 
divided by the quantities 2-a(i) at 21. For 
35 quantization purposes, the residual vector rcx[n] is 
subdivided into four subvectors, corresponding to the 
subdivision into four cepstral subvectors. On the basis 
of a dictionary obtained by prior learning, the unit 22 
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undertakes the vector quantization of each subvector of 
the residual vector rcx[n] . This quantization can 
consist, for each subvector srcx[n], in selecting from 
the dictionary the quantized subvector srcx_q[n] which 
5 minimizes the quadratic error |jsrcx[n] - srcx _ q[n]|| 2 . The 
set icxs of quantization indices icx, corresponding to 
the addresses in the dictionary or dictionaries of the 
quantized residual subvectors srcx_q[n], is provided to 
the output multiplexer 6. 

10 

The unit 22 also delivers the values of the quantized 
residual subvectors, which form the vector rcx_q[n]. 
The latter is delayed by one frame at 23, and its 
components are multiplied by the coefficients a(i) at 

15 24 so as to provide the vector to the negative input of 
the subtractor 20. The latter vector is, on the other 
hand, provided to an adder 25, the other input of which 
receives a vector formed by the components of the 
quantized residual rcx_q[n], respectively multiplied by 

20 the quantities l-a(i) at 26. The adder 25 thus delivers 
the quantized cepstral vector cx_q[n] which will be 
recovered by the decoder. 

The prediction coefficient oc(i) can be optimized 
25 separately for each of the cepstral coefficients. The 
quantization dictionaries may also be optimized 
separately for each four cepstral subvectors. Moreover, 
it is possible, in a manner known per se, to normalize 
the cepstral vectors before applying the 
30 prediction/quantization scheme, on the basis of the 
variance of the cepstra. 

It should be noted that the above scheme for quantizing 
the cepstral coefficients cannot be applied other than 
35 in respect of certain of the frames only. For example, 
provision may be made for a second mode of quantization 
as well as a process for selecting that one of the two 
modes which minimizes a least squares criterion with 
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the cepstral coefficients to be quantized, and a bit 
indicating which of the two modes has been selected may 
be transmitted with the frame quantization indices. 

5 The quantized cepstral coefficients cx_sup_q = cx_q[n] 
provided by the adder 25 are addressed to a module 28 
which recalculates the spectral amplitudes associated 
with one or more of the harmonics of the fundamental 
frequency F 0 (figure 1) . These spectral amplitudes are, 

10 for example, calculated in compressed form, by applying 
the Fourier transform to the quantized cepstral 
coefficients while taking account of the nonlinear 
distortion of the frequency scale used in the cepstral 
transformation. The amplitudes thus recalculated are 

15 provided to an adaptation module 29 which compares them 
with amplitudes of maxima determined by the extraction 
module 9. 

The adaptation module 29 controls the post-lifter 15 in 
20 such a way as to minimize a discrepancy in modulus 
between the spectrum of the audio signal and the 
corresponding modulus values calculated at 28. This 
discrepancy in modulus can be expressed by a sum of 
absolute values of differences of amplitudes, 
25 compressed or otherwise, corresponding to one or more 
of the harmonic frequencies. This sum can be weighted 
as a function of the spectral amplitudes associated 
with these frequencies. 

30 Optimally, the discrepancy in modulus taken into 
account in the adaptation of the pos t-li f tering would 
take account of all the harmonics of the spectrum. 
However, in order to reduce the complexity of the 
optimization, the module 28 can resynthesize the 

35 spectral amplitudes for just one or more frequencies 
which are multiples of the fundamental frequency F 0 and 
which are selected on the basis of the magnitude of the 
modulus of the spectrum in absolute value. The 
adaptation module 29 can, for example, consider the 
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three most intense spectral peaks in the calculation of 
the discrepancy in modulus to be minimized. 

In another embodiment, the adaptation module 2 9 
estimates a curve of spectral masking of the audio 
signal by means of a psycho-acoustic model, and the 
frequencies taken into account in the calculation of 
the discrepancy in modulus to be minimized are selected 
on the basis of the magnitude of the modulus of the 
spectrum in relation to the masking curve (it is, for 
example, possible to take the three frequencies for 
which the modulus of the spectrum most exceeds the 
masking curve) . Various conventional methods can be 
used to calculate the masking curve from the audio 
signal. It is, for example, possible to use that 
developed by J.D. Johnston ("Transform Coding of Audio 
Signals Using Perceptual Noise Criteria", IEEE Journal 
on Selected Area in Communications, Vol. 6, No. 2, 
February 198 8) . 

To carry out the adaptation of the post-lif tering , the 
module 29 can use a filter identification model. A 
simpler method consists in predefining a collection of 
sets of post-lif tering parameters, that is to say a 
collection of pairs y lr j2 in the case of post-lif tering 
according to relations (8), in performing the 
operations incumbent on the modules 15, 16, 18 and 28 
for each of these sets of parameters, and in retaining 
that of the sets of parameters which leads to the 
minimum discrepancy in modulus between the spectrum of 
the signal and the recalculated values. The 
quantization indices provided by the module 18 are then 
those which relate to the best set of parameters. 

By a process similar to that for extracting the 
coefficients cx_sup representing the compressed upper 
envelope LX_sup of the spectrum of the signal, the 
coder determines the coefficients cx_inf representing a 
compressed lower envelope LX_inf. A module 30 extracts 
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from the compressed spectrum LX, spectral amplitudes 
associated with frequencies situated in zones of the 
spectrum which are intermediate with respect to the 
frequencies which are multiples of the estimated 
5 fundamental frequency F 0 - 

In the example illustrated by fiqures 4 and 5, each 
amplitude associated with a frequency situated in a 
zone intermediate between two successive harmonics k . F 0 

10 and (k+1) . F 0 corresponds simply to the modulus of the 
spectrum for the frequency (k+l/2).F 0 situated in the 
middle of the interval separatinq the two harmonics. In 
another embodiment, this amplitude could be an average 
of the modulus of the spectrum over a small span 

15 surrounding this frequency (k+l/2).F 0 . 

A module 31 carries out an interpolation, for example 
linear, of the spectral amplitudes associated with the 
frequencies situated in the intermediate zones so as to 
20 obtain the compressed lower envelope LX_inf. 

The cepstral transformation applied to this compressed 
lower envelope LX_inf is performed according to a 
frequency scale resultinq from a nonlinear distortion 
25 applied by a module 32. The IFFT module 33 calculates a 
cepstral vector of NCI cepstral coefficients cx_inf of 
orders 0 to NCI-1 representing the lower envelope. NCI 
is a number which may be substantially smaller than 
NCS, for example NCI = 4. 

30 

The nonlinear transformation of the frequency scale for 
the cepstral transformation of the lower envelope can 
be carried out to a scale which is finer at the high 
frequencies than at the low frequencies, thereby 
35 advantageously allowing good modeling of the unvoiced 
components of the signal at the high frequencies. 
However, to ensure homogeneity of representation 
between the upper envelope and the lower envelope, the 
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same scale will preferably be adopted in the module 32 
as in the module 12 (Mel in the example considered) . 

The cepstral coefficients cx_inf representing the 
5 compressed lower envelope are quantized by a module 34, 
which may operate in the same manner as the module 18 
for quantizing the cepstral coefficients representing 
the compressed upper envelope. In the case considered, 
where we restricted ourselves to NCI = 4 cepstral 

10 coefficients for the lower envelope, the vector thus 
formed is subjected to a prediction residual vector 
quantization performed by means identical to those 
represented in figure 6 but without subdivision into 
subvectors. The quantization index icx = icxi 

15 determined by the vector quantizer 22 for each frame 
relating to the coefficients cx_inf is provided to the 
output multiplexer 6 of the coder. 

The coder represented in figure 1 does not comprise any 
20 particular device for coding the phases of the spectrum 
at the harmonics of the audio signal. 

On the other hand, it comprises means 36-40 for coding 
time information related to the phase of the 
25 nonharmonic component represented by the lower 
envelope . 

A spectral decompression module 36 and an IFFT module 
37 form a temporal estimate of the frame of the non- 
30 harmonic component. The module 36 applies a 
decompression function which is the reciprocal of the 
compression function applied by the module 8 (that is 
to say an exponential or a 1/y power function) to the 
compressed lower envelope LX_inf produced by the 
35 interpolation module 31. This provides the modulus of 
the estimated frame of the nonharmonic component, whose 
phase is taken equal to that cp x of the spectrum of the 
signal X over the frame. The inverse Fourier transform 
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performed by the module 37 provides the estimated frame 
of the nonharmonic component. 

The module 38 subdivides this estimated frame of the 
5 nonharmonic component into several time segments. The 
frame delivered by the module 37 being made up of 
2N = 512 weighted samples, as illustrated by figure 3, 
the module 38 considers only the first N/2 = 128 
samples and the last N/2 = 128 samples, and subdivides 
10 them, for example, into eight segments of 32 
consecutive samples each representing 4 ms of signal. 

For each segment, the module 38 calculates the energy 
equal to the sum of the squares of the samples, and 

15 forms a vector El formed of eight positive real 
components equal to the eight calculated energies. The 
largest of these eight energies, denoted EM, is also 
determined so as to be provided, with the vector El, to 
a normalizing module 39. The latter divides each 

20 component of the vector El by EM, so that the 
normalized vector Emix is formed of eight components 
lying between 0 and 1. It is this normalized vector 
Emix, or weighting vector, which is subjected to the 
quantization by the module 40. The latter can carry out 

25 a vector quantization with a dictionary determined 
during prior learning. The quantization index iEm is 
provided by the module 40 to the output multiplexer 6 
of the coder. 

30 Figure 7 shows a variant embodiment of the means 
employed by the coder of figure 1 to determine the 
energy weighting vector Emix for the frame of the non- 
harmonic component. The spectral decompression and IFFT 
modules 36, 37 operate like those which bear the same 

35 references in figure 1. A selection module 42 is added 
so as to determine the value of the modulus of the 
spectrum subjected to the inverse Fourier transform 37. 
On the basis of the estimated fundamental frequency Fo, 
the module 42 identifies harmonic regions and non- 
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harmonic regions of the spectrum of the audio signal. 
For example, a frequency will be regarded as belonging 
to a harmonic region if it is located in a frequency 
interval centered on a harmonic k.F 0 and of width 
5 corresponding to a synthesized spectral line width, and 
to a nonharmonic region otherwise. In the nonharmonic 
regions, the complex signal subjected to the IFFT 37 is 
equal to the value of the spectrum, that is to say its 
modulus and its phase correspond to the values |X| and 

10 9x provided by the FFT module 3. In the harmonic 
regions, this complex signal has the same phase cp x as 
the spectrum and a modulus given by the lower envelope 
after spectral decompression 36. Proceeding thus 
according to figure 7 achieves more accurate modeling 

15 of the nonharmonic regions. 

The decoder represented in figure 8 comprises an input 
demultiplexer 45 which extracts from the binary stream 
O, emanating from a coder according to figure 1, the 

20 quantization indices iF, icxs, icxi, iEm for the 
fundamental frequency F 0 , the cepstral coefficients 
representing the compressed upper envelope, the 
coefficients representing the compressed lower 
envelope, and the weighting vector Emix, and 

25 distributes them respectively to modules 46, 47, 48 and 
49. These modules 46-4 9 comprise quantization 
dictionaries similar to those of the modules 5, 18, 34 
and 40 of figure 1, so as to restore the values of the 
quantized parameters. The modules 47 and 48 have 

30 dictionaries so as to form the quantized prediction 
residuals rcx_q[n], and they deduce therefrom the 
quantized cepstral vectors cx_q[n] with elements 
identical to the elements 23-26 of figure 6. These 
quantized cepstral vectors cx_q[n] provide the cepstral 

35 coefficients cx_sup_q and cx_inf_q processed by the 
decoder. 

A module 51 calculates the fast Fourier transform of 
the cepstral coefficients cx_sup for each signal frame. 
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The frequency scale of the compressed spectrum 
resulting therefrom is modified nonlinearly by a module 
52 applying the nonlinear transformation reciprocal to 
that of the module 12 of figure 1, and which provides 
5 the estimate LX_sup of the compressed upper envelope. A 
spectral decompression of LX_sup, carried out by a 
module 53, provides the upper envelope X_sup comprising 
the estimated values of the modulus of the spectrum at 
the frequencies which are multiples of the fundamental 

10 frequency F 0 . The module 54 synthesizes the spectral 
estimate X v of the harmonic component of the audio 
signal, through a sum of spectral lines centered on the 
frequencies which are multiples of the fundamental 
frequency F 0 and whose amplitudes (in modulus) are 

15 those given by the upper envelope X_sup . 

Although the digital input stream <I> does not comprise 
any specific information regarding the phase of the 
spectrum of the signal at the harmonics of the 

20 fundamental frequency, the decoder of figure 8 is 
capable of extracting information regarding this phase 
from the cepstral coefficients cx_sup_q representing 
the compressed upper envelope. This phase information 
is used to assign a phase cp(k) to each of the spectral 

25 lines determined by the module 54 in the estimate of 
the harmonic component of the signal. 

As a first approximation, the speech signal may be 
regarded as being of minimum phase. Moreover, it is 

30 known that the minimum phase information may be deduced 
easily from cepstral modeling. This minimum phase 
information is therefore calculated for each harmonic 
frequency. The minimum phase assumption signifies that 
the energy of the synthesized signal is localized at 

35 the start of each period of the fundamental frequency 
F 0 . 

In order to be closer to a real speech signal, slight 
dispersion is introduced by means of a specific post- 
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liftering of the cepstra during synthesis of the phase. 
With this post-lif tering, performed by the module 55 of 
figure 8, it is possible to emphasize the formant 
resonances of the envelope and hence to control the 
dispersion of the phases. This post-lif tering is, for 
example, of the form (8) . 

To limit the phase breaks, it is preferable to smooth 
the post-lif tered cepstral coefficients, this being 
performed by the module 56. The module 57 deduces from 
the post-lif tered and smoothed cepstral coefficients 
the minimum phase assigned to each spectral line 
representing a harmonic peak of the spectrum. 

The operations performed by the modules 56, 57 for 
smoothing and extracting the minimum phase are 
illustrated by the flowchart of figure 9. The module 56 
examines the variations in the cepstral coefficients so 
as to apply lesser smoothing in the presence of abrupt 
variations than in the presence of slow variations. To 
do this, it performs the smoothing of the cepstral 
coefficients by means of a forget factor X c chosen as a 
function of a comparison between a threshold d t h and a 
distance d between two successive sets of post-lif tered 
cepstral coefficients. The threshold d t h is itself 
adapted as a function of the variations of the cepstral 
coefficients . 

The first step 60 consists in calculating the distance 
d between the two successive vectors relating to frames 
n-1 and n. These vectors, here denoted cxp[n-l] and 
cxp[n], correspond for each frame to the collection of 
NCS post-lif tered cepstral coefficients representing 
the compressed upper envelope. The distance used may in 
particular be the Euclidean distance between the two 
vectors or else a quadratic distance. 

Two smoothings are firstly performed, respectively by 
means of forget factors ?Wi n and ^ ax , so as to determine 
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a minimum distance d m in and a maximum distance d max . The 
threshold d t h is then determined in step 70 as being 
situated between the minimum and maximum distances d m i n , 
d m a X : d t h = P-d max + (l-(3) .d min , the coefficient |3 being, 
5 for example, equal to 0.5. 

In the example represented, the forget factors A^n and 
Xmax are themselves selected from among two distinct 
values, respectively \mm, ^in2 and A™ ax i, K*x2 lying 

10 between 0 and 1, the indices Ka.ni, Kaxi each being 
substantially nearer to 0 than the indices Amin2/ ^max2 • 
If d > dmin (test 61) , the forget factor K± n is equal to 
Amini (step 62) ; otherwise, it is taken equal to K± n 2 
(step 63). In step 64, the minimum distance d min is 

15 taken equal to ?^i n .d m in + (1-^min) -d. If d > d max (test 
65) , the forget factor A™ ax is equal to K axl (step 66) ; 
otherwise, it is taken equal to ?Wax2 (step 67) . In 
step 68, the minimum distance d ma x is taken equal to 

^max-dmax + (1-A™ ax ) .d. 

20 

If the distance d between the two consecutive cepstral 
vectors is greater than the threshold d tn (test 71), 
then a value X cl relatively close to 0 is adopted for 
the forget factor X c (step 72). In this case, the 

25 corresponding signal is regarded as being of 
nonstationary type, so that there is no need to keep a 
large memory of the earlier cepstral coefficients. If 
d < dth/ a value X c2 which is not as close to 0 is 
adopted in step 73 for the forget factor X c , so as to 

30 further smooth the cepstral coefficients. The smoothing 
is performed in step 14, where the vector cxl [n] of 
smoothed coefficients for the current frame n is 
determined by: 

cxl[n] = Xc.cxl[n-1] + ( 1-X C ) . cxp [n] (11) 

35 

The module 57 then calculates the minimum phases (p(k) 
associated with the harmonics k . F 0 . In a known manner, 
the minimum phase for a harmonic of order k is given 
by: 
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NCS-1 

<t>(k) = -2 . cxl[n, m] . sin (2TTmk F 0 / F e ) ( 12 ) 

m=l 

where cxl[n,m] designates the smoothed cepstral 
coefficient of order m for frame n. 

5 In step 75, the harmonic index k is initialized to 1. 
To initialize the calculation of the minimum phase 
assigned to harmonic k, the phase cp(k) and the cepstral 
index m are initialized to 0 and 1 respectively in 
step 76. In step 77, the module 57 adds the quantity 

10 -2.cxl[n,m] . sin ( 27tmk . F 0 /F e ) to the phase (p(k). The 
cepstral index m is incremented in step 7 8 and compared 
with NCS in step 79. Steps 77 and 78 are repeated so 
long as m < NCS. When m = NCS, the calculation of the 
minimum phase is terminated for harmonic k, and the 

15 index k is incremented in step 80. The calculation of 
minimum phases 7 6-7 9 is rerun for the next harmonic so 
long as k.F 0 < F e /2 (test 81) . 

In the exemplary embodiment according to figure 8, the 
20 module 54 takes account of a constant phase over the 
width of each spectral line, equal to the minimum phase 
<p(k) provided for the corresponding harmonic k by the 
module 57 . 

25 The estimate X v of the harmonic component is 
synthesized by summation of spectral lines positioned 
at the harmonic frequencies of the fundamental 
frequency Fo . During this synthesis, it is possible to 
position the spectral lines on the frequency axis with 

30 a higher resolution than the resolution of the Fourier 
transform. To do this, a reference spectral line is 
precalculated once and for all according to the higher 
resolution. This calculation can consist of a Fourier 
transform of the analysis window F A with a transform 

35 size of 16 384 points, achieving a resolution of 0.5 Hz 
per point. The synthesis of each harmonic line is then 
performed by the module 54 by positioning on the 
frequency axis the reference line with high resolution, 
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and by undersampling this reference spectral line so as 
to reduce to the resolution of 16.625 Hz of the Fourier 
transform on 512 points. This enables the spectral line 
to be positioned accurately. 

5 

For the determination of the lower envelope, the FFT 
module 85 of the decoder of figure 8 receives the NCI 
quantized cepstral coefficients cx_inf_q of orders 0 to 
NCI - 1, and it advantageously supplements them with 

10 the NCS - NCI cepstral coefficients cx_sup_q of order 
NCI to NCS - 1 representing the upper envelope. 
Specifically, it may be estimated that, as a first 
approximation, the fast variations of the compressed 
lower envelope are well reproduced by those of the 

15 compressed upper envelope. In another embodiment, the 
FFT module 85 could consider only the NCI cepstral 
parameters cx_inf_q. 

The module 86 converts the frequency scale in a manner 
20 reciprocal to the conversion carried out by the module 
32 of the coder, so as to restore the estimate LX_inf 
of the compressed lower envelope, subjected to the 
spectral decompression module 87. At the output of the 
module 87, the decoder is furnished with a lower 
25 envelope X_inf comprising the values of the modulus of 
the spectrum in the valleys situated between the 
harmonic peaks . 

This envelope X_inf will modulate the spectrum of a 
30 noise frame whose phase is processed as a function of 
the quantized weighting vector Emix extracted by the 
module 49. A generator 88 delivers a normalized noise 
frame whose 4-ms segments are weighted in a module 8 9 
in accordance with the normalized components of the 
35 vector Emix provided by the module 4 9 for the current 
frame. This noise is white noise high-pass filtered so 
as to take account of the low level which in principle 
the unvoiced component has at the low frequencies. On 
the basis of the energy-weighted noise, the module 90 
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forms frames of 2N = 512 samples by applying the 
analysis window f A/ the insertion of 256 zero samples 
and the circular permutation for phase compensation in 
accordance with what was explained with reference to 
5 figure 3. The Fourier transform of the resulting frame 
is calculated by the FFT module 91. 

The spectral estimate X uv of the nonharmonic component 
is determined by the spectral synthesis module 92 which 
10 performs a frequency-by-frequency weighting. This 
weighting consists in multiplying each complex spectral 
value provided by the FFT module 91 by the value of the 
lower envelope X_inf obtained for the same frequency by 
the spectral decompression module 87. 

15 

The spectral estimates X v , X uv of the harmonic (voiced 
in the case of a speech signal) and nonharmonic (or 
unvoiced) components are combined by a mixing module 95 
controlled by a module 96 for analyzing the degree of 
20 harmonicity (or of voicing) of the signal. 

The organization of these modules 95, 96 is illustrated 
by figure 10. The analysis module 96 comprises a unit 
97 for estimating a frequency-dependent degree of 

25 voicing W from which are calculated four frequency- 
dependent gains, namely two gains g v , g uv controlling 
the relative magnitude of the harmonic and nonharmonic 
components in the synthesized signal, and two gains 
9v_<p/ guv_<p used to add noise to the phase of the 

30 harmonic component. 

The degree of voicing W(i) is a continuously varying 
value lying between 0 and 1 determined for each 
frequency index i (0 < i < N) as a function of the 
35 upper envelope X_sup(i) and of the lower envelope 
X_inf(i) which are obtained for this frequency i by the 
decompression modules 53, 87. The degree of voicing 
W(i) is estimated by the unit 97 for each frequency 
index i corresponding to a harmonic of the fundamental 
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I F 1 I 

frequency F 0 , namely i = 2Nk — + — for k = 1, 2, . . . , 
L F e 2 J 

by an increasing function of the ratio of the upper 
envelope X_sup to the lower envelope X_inf at this 
frequency, for example according to the formula: 
W(i) = mink 10-l°g:c[x_sup(i)/X_inf(i)]| 
I Vth(F 0 ) J 



The threshold Vth(F 0 ) corresponds to the average 
dynamic swing calculated over a purely voiced synthetic 
spectrum at the fundamental frequency. It is 
advantageously chosen to be dependent on the 
fundamental frequency F 0 . 

The degree of voicing W(i) for a frequency other than 
the harmonic frequencies is obtained simply as being 
equal to that estimated for the closest harmonic. 

The gain g v (i), which depends on the frequency, is 
obtained by applying a nonlinear function to the degree 
of voicing W(i) (block 98). This nonlinear function 
has, for example, the form represented in figure 11: 
g v (i) = 0 if 0 < w (i) < Wl 
W(i) - Wl 

g T (i = if Wl < W(i) < W2 (14) 

W2 - Wl 

g v (i) = 1 if W2 < W (i) < 1 
the thresholds Wl, W2 being such that 0 < Wl < W2 < 1. 
The gain g uv can be calculated in a similar manner to 
the gain g v (the sum of the two gains g v , g uv being 
constant, for example equal to 1) , or deduced simply 
from the latter through the relation g uv (i) = 1 - g v (i), 
as shown diagrammatically by the subtractor 99 in 
figure 10. 

It is beneficial to be able to add noise to the phase 
of the harmonic component of the signal at a given 
frequency if the analysis of the degree of voicing 
shows that the signal is actually of nonharmonic type 
at this frequency. To do this, the phase <{)' v of the 
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mixed harmonic component is the result of a linear 
combination of the phases <p v , tp uv of the harmonic and 
nonharmonic components X v , X uv synthesized by the 
modules 54, 92. 

5 

The gains gv_<p, g U v_<p respectively applied to these 
phases are calculated from the degree of voicing W and 
also weighted as a function of the frequency index i, 
given that the adding of noise to the phase is actually 
10 useful only beyond a certain frequency. 

A first gain g v i_<p is calculated by applying a nonlinear 
function to the degree of voicing W(i), as shown 
U diagrammatically by the block 100 in figure 10. This 

p 15 nonlinear function can have the form represented in 
U figure 12: 

$ gvi_<p(i) = Gl if 0 < W(i) < W3 

^ W(i) — W3 

Ul gvi_,(x) = Gl + (1 - Gl)-^ if W3 < W(i) < W4 (15) 

,Q W4 - W3 

s_ gvi_,(i) =1 if W4 < w(i) < l 

g 20 the thresholds W3 and W4 being such that 0 < W3 < W4 
0 < If and the minimum gain Gl lying between 0 and 1. 

fly A multiplier 101 multiplies for each frequency of index 

i the gain g v i_«p by another gain g V 2_<p dependent only on 
25 the frequency index i, so as to form the gain g v _ (p (i). 
The gain g V 2_<p(i) depends nonlinearly on the frequency 
index i, for example as indicated in figure 13: 
gv2_<p(i) =1 if 0 < i < il 

gv 2 _cp(i) = 1 - (1 - G2) 1 - 11 if il < i < 12 (16) 
i2 - il 

3 0 g v2 _ (p (i) = G2 if i2 < i < 1 

the indices il and i2 being such that 0 < il < 12 < N, 
and the minimum gain G2 lying between 0 and 1. The gain 
guv_<p(i) can be calculated simply as being equal to 
1 ~ 9v_ p (i) = 1 - g v i_ p (i) .g V 2_<p(i) (subtractor 102 of 

35 figure 10) . 
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The complex spectrum Y of the synthesized signal is 
produced by the mixing module 95, which carries out the 
following mixing relation, for 0 < i < N: 

Y(i) = g v ( i ) . I X v ( i ) | . exp [ j <J>' V ( i ) ] +g uv ( i ) . X uv ( i ) (17) 
5 with = g v _,(i) .9 v (i) + g uv _„(i) .q>uv(i) (18) 

where <p v (i) designates the argument of the complex 
number X v (i) provided by the module 54 for the 
frequency of index i (block 104 of figure 10) , and 
(puv(i) desginates the argument of the complex number 
10 X uv (i) provided by the module 92 (block 105 of 
figure 10) . This combination is carried out by the 
multipliers 106-110 and the adders 111-112 represented 
in figure 10. 

15 The mixed spectrum Y(i) for 0 < i < 2N (with Y(2N-l-i) 
= Y(i)) is then transformed into the time domain by the 
IFFT module 115 (figure 8) . Only the first N/2 = 128 
and the last N/2 = 128 samples of the frame of 2N = 512 
samples produced by the module 115 are retained, and 

20 the circular permutation inverse to that illustrated by 
figure 3 is applied to obtain the synthesized frame of 
N = 256 samples weighted by the analysis window f R . 

The frames obtained successively in this manner are 
25 finally processed by the temporal synthesis module 116 
which forms the decoded audio signal x . 

The temporal synthesis module 116 performs an overlap 
sum of frames modified with respect to those evaluated 
30 successively at the output of the module 115. The 
modification may be viewed in two steps illustrated by 
figures 14 and 15 respectively. 

The first step (figure 14) consists in multiplying each 
35 frame 2' delivered by the IFFT module 115 by a window 
l/f A inverse to the analysis window f A employed by the 
module 1 of the coder. The samples of the frame 2" 
resulting therefrom are therefore uniformly weighted. 



WO 01/03121 



- 30 - 



PCT/FR00/019O9 



The second step (figure 15) consists in multiplying the 
samples of this frame 2" by a synthesis window f s 
satisfying the following properties: 

f s (N-L+i) + f s (i) = A for 0 < i < L (19) 
5 f s (i) = A for L < i < N-L (20) 

where A designates an arbitrary positive constant, for 
example A = 1. The synthesis window f s (i) increases 
progressively from 0 to A for i going from 0 to L. It 
is, for example, a raised half-sinusoid: 
10 f s (i) = — . (1 - cos t(i + 1 / 2>T / L] ) for 0 < i < L (21) 

After having reweighted each frame 2" by the synthesis 
window f s , the module 116 positions the successive 
frames with their time shifts of M = 160 samples and 

15 their time overlaps of L = 96 samples, then it sums the 
frames thus positioned over time. Owing to the 
properties (19) and (20) of the synthesis window f s , 
each sample of the decoded audio signal k thus obtained 
is assigned a uniform global weight, equal to A. This 

20 global weight originates from the contribution of a 
single frame if the sample has in this frame a rank i 
such that L < i < N - L, and comprises the summed 
contributions of two successive frames if 0 < i < L 
where N - L < i < N. 

25 

It is thus possible to perform the temporal synthesis 
in a simple manner even if, as in the case considered, 
the overlap L between two successive frames is smaller 
than half the size N of these frames. 

30 

The two steps set forth above for modifying the signal 
frames may be merged into a single step. It is 
sufficient to precalculate a compound window 
f c (i) = f s (i)/f A (i) and simply to multiply the frames 2' 
35 of N = 256 samples delivered by the module 115 by the 
compound window f c before performing the overlap 
summation . 
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Figure 16 shows the shape of the compound window f c in 
the case where the analysis window f A is a Hamming 
window and the synthesis window f s has the form given 
by relations (19) to (21) . 

5 

Other forms of the synthesis window f s satisfying 
relations (19) and (20) may be employed. In the variant 
of figure 17, it is a piecewise affine function defined 
by: 

10 fsU) = A.i/L for 0 < i < L (22) 

In order to improve the quality of coding of the audio 
signal, the coder of figure 1 can increase the rate of 
formation and of analysis of the frames, so as to 

15 transmit more quantization parameters to the decoder. 
In the frame structure represented in figure 2, a frame 
of N = 256 samples (32 ms) is formed every 20 ms . These 
frames of 256 samples could be formed at a higher rate, 
for example 10 ms , two successive frames then having a 

20 shift of M/2 = 80 samples and an overlap of 176 
samples . 

Under these conditions, it is possible to transmit the 
complete sets of quantization parameters iF, icxs, 

25 icxi, iEm for just one subcollection of frames, and to 
transmit, for the other frames, parameters making it 
possible to perform a suitable interpolation at the 
level of the decoder. In the example envisaged 
hereinabove, the subcollection for which complete 

30 parameter sets are transmitted may consist of the 
frames of integer rank n, whose periodicity is 
M/F e = 20 ms, and the frames for which an interpolation 
is performed may be those of half-integer rank n + 1/2 
which are shifted by 10 ms with respect to the frames 

35 of the subcollection. 

In the embodiment illustrated by figure 18, the 
notation cx_q[n-l] and cx_q[n] designates quantized 
cepstral vectors determined, for two successive frames 
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of integer rank, by the quantization module 18 and/or 
by the quantization module 34. These vectors comprise, 
for example, four consecutive cepstral coefficients 
each. They could also comprise more cepstral 
5 coefficients. 

A module 120 performs an interpolation of these two 
cepstral vectors cx_q[n-l] and cx_q[n] so as to 
estimate an intermediate value cx_i[n-l/2]. The 

10 interpolation performed by the module 120 can be a 
simple arithmetic average of the vectors cx_q[n-l] and 
cx_q[n] . As a variant, the module 120 could apply a 
more sophisticated interpolation formula, for example 
polynomial, based also on the cepstral vectors obtained 

15 for frames earlier than frame n-1. Moreover, if more 
than one interpolated frame is interposed between two 
consecutive frames of integer rank, the interpolation 
takes account of the relative position of each 
interpolated frame. 

20 

With the aid of the means described above, the coder 
also calculates the cepstral coefficients cx[n-l/2] 
relating to the frame of half-integer rank. In the case 
of the upper envelope, these cepstral coefficients are 

25 those provided by the IFFT module 13 after post- 
liftering 15 (for example with the same post-lif tering 
coefficients as for the previous frame n-1) and 
normalization 16. In the case of the lower envelope, 
the cepstral coefficients cx[n-l/2] are those delivered 

30 by the IFFT module 33. 

A subtractor 121 forms the difference ecx[n-l/2] 
between the cepstral coefficients cx[n-l/2] calculated 
for the frame of half-integer rank and the coefficients 
35 cx_i[n-l/2] estimated by interpolation. This difference 
is provided to a quantization module 122 which 
addresses quantization indices icx[n-l/2] to the output 
multiplexer 6 of the coder. The module 122 operates, 
for example, by vector quantization of the 
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interpolation errors ecx[n-l/2] determined successively 
for the frames of half-integer rank. 

This quantization of the interpolation error can be 
5 performed by the coder for each of the NCS + NCI 
cepstral coefficients used by the decoder, or for just 
some of them, typically those of smallest orders. 

The corresponding means of the decoder are illustrated 

10 by figure 19. The decoder operates essentially like 
that described with reference to figure 8 to determine 
the signal frames of integer rank. An interpolation 
module 124 identical to the module 120 of the coder 
estimates the intermediate coefficients cx_i[n-l/2] 

15 from the quantized coefficients cx_q[n-l] and cx_q[n] 
provided by the module 47 and/or the module 48 from the 
indices icxs, icxi extracted from the stream O. A 
module for extracting parameters 125 receives the 
quantization index icx[n-l/2] from the input 

20 demultiplexer 45 of the decoder, and deduces therefrom 
the quantized interpolation error ecx_q[n-l/2] from the 
same quantization dictionary as that used by the module 
122 of the coder. An adder 126 sums the cepstral 
vectors cx__i[n-l/2] and ecx_q[n-l/2] so as to provide 

25 the cepstral coefficients cx[n-l/2] which will be used 
by the decoder (modules 51-57, 95, 96, 115 and/or 
modules 85-87, 92, 95, 96, 115) so as to form the 
interpolated frame of rank n-1/2. 

30 If just some of the cepstral coefficients have formed 
the subject of an interpolation error quantization, the 
others are determined by the decoder by a simple 
interpolation with no correction. 

35 The decoder can also interpolate the other parameters 
F 0 , Emix used to synthesize the signal frames. The 
fundamental frequency F 0 can be linearly interpolated, 
either in the time domain, or (preferably) directly in 
the frequency domain. For the possible interpolation of 
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the energy weighting vector Emix, it is appropriate to 
perform the interpolation after denormalization and 
while of course taking account of the time shifts 
between frames. 

5 

It should be noted that it is especially advantageous, 
in order to interpolate the representation of the 
spectral envelopes, to perform this interpolation in 
the cepstral domain. Unlike an interpolation performed 
10 on other parameters, such as the LSP coefficients 
(standing for "Line Spectrum Pairs") , the linear 
interpolation of the cepstral coefficients corresponds 
to the linear interpolation of the compressed spectral 
amplitudes . 

15 

In the variant represented in figure 20, the coder uses 
the cepstral vectors cx_q[n], cx_q[n-l], cx_q[n-r] 
and cx_q[n-l/2] calculated for the last frames which 
have passed (r > 1) so as to identify an optimal 

20 interpolator filter which, when fed with the quantized 
cepstral vectors cx_q[n-r], cx_q[n] relating to 

the frames of integer rank, delivers an interpolated 
cepstral vector cx_i[n-l/2] which exhibits a minimum 
distance with the vector cx[n-l/2] calculated for the 

25 last frame of half-integer rank. 

In the example represented in figure 20, this 
interpolator filter 128 is present in the coder, and a 
subtractor 129 deducts its output cx_i[n-l/2] from the 

30 calculated cepstral vector cx[n-l/2]. A minimization 
module 130 determines the parameter set {P} of the 
interpolator filter 128, for which the interpolation 
error ecx[n-l/2] delivered by the subtractor 129 
exhibits a minimum norm. This parameter set {P} is 

35 addressed to a quantization module 131 which provides a 
corresponding quantization index iP to the output 
multiplexer 6 of the coder. 
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As a function of the bit rate allocated in the stream <D 
to the indices for quantizing the parameters {P} 
defining the optimal interpolator filter 128, it will 
be possible to adopt a finer or coarser quantization of 
5 these parameters, or a more or less elaborate form of 
the interpolator filter, or else to envisage several 
interpolator filters quantized differently for various 
vectors of cepstral coefficients. 

10 In a simple embodiment, the interpolator filter 128 is 
linear, with r = 1: 

cx_i[n-l/2] = p.cx_q[n-l] + (1-p) . cx_q [n] (23) 

15 and the parameter set {P} is limited to the coefficient 
p lying between 0 and 1. 

From the indices iP for quantizing the parameters {P} 
obtained in the binary stream <p, the decoder 
20 reconstructs the interpolator filter 128 (to within 
quantization errors) and processes the spectral vectors 
cx_q[n-r], cx_q[n] so as to estimate the cepstral 

coefficients cx[n-l/2] used to synthesize the frames of 
half -integer rank. 

25 

Generally, the decoder can use a simple interpolation 
method (without transmission of parameters by the coder 
for the frames of half-integer rank), and an 
interpolation method with incorporation of a quantized 

30 interpolation error (according to figures 17 and 18), 
or an interpolation method with an optimal interpolator 
filter (according to figure 19) to evaluate the frames 
of half-integer rank in addition to the frames of 
integer rank evaluated directly, as explained with 

35 reference to figures 8 to 13. The temporal synthesis 
module 116 can then combine the collection of these 
frames evaluated so as to form the synthesized signal x 
in the manner explained hereinbelow with reference to 
figures 14, 21 and 22. 
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As in the method of temporal synthesis described above, 
the module 116 performs an overlap sum of frames 
modified with respect to those evaluated successively 
5 at the output of the module 115, and this modification 
can be viewed in two steps of which the first is 
identical to that described above with reference to 
figure 14 (divide the samples of the frame 2' by the 
analysis window f A ) . 

10 

The second step (figure 21) consists in multiplying the 
samples of the renormalized frame 2" by a synthesis 
window fg satisfying the following properties: 

15 fg(i) = 0 for 0 < i < N/2 - M/p and N/2 + 

M/p < i < N (24) 

fs(i) + fg(i + M/p) = A for N/2 - M/p < i < N/2 (25) 

20 where A designates an arbitrary positive constant, for 
example A = 1 and p is the integer such that the time 
shift between the successive frames (calculated 
directly and interpolated) is M/p samples, i.e. p = 2 
in the example described. The synthesis window f^(i) 

25 increases progressively for i going from N/2 - M/p to 
N/2. It is, for example, a raised sinusoid on the 
interval N/2 - M/p < i < N/2 + M/p. In particular, the 
synthesis window f«! can, over this interval, be a 
Hamming window (as represented in figure 21) or a 

30 Hanning window. 

Figure 21 shows the successive frames 2" repositioned 
over time by the module 116. The hatching indicates the 
removed portions of the frames (synthesis window at 0) . 
35 It may be seen that by performing the overlap sum of 
the samples of the successive frames, the property (25) 
ensures homogeneous weighting of the samples of the 
synthesized signal. 
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As in the method of synthesis illustrated by figures 14 
and 15, the procedure for weighting the frames obtained 
by inverse Fourier transform of the spectra Y can be 
5 performed in a single step, with a compound window 
fc (i) = fs(i)/f A (i). Figure 22 shows the form of the 
compound window f' c in the case where the windows f fi and 
f s are of Hamming type. 

10 Like the method of temporal synthesis illustrated by 
figures 14 to 17, that illustrated by figures 14, 21 
and 22 makes it possible to take into account an 
overlap L between two analysis frames (for which the 
analysis is performed completely) which is smaller than 

15 half the size N of these frames. In general, this 
latter method is applicable when the successive 
analysis frames exhibit mutual time shifts M of more 
than N/2 samples (possibly even of more than N samples 
if a very low bit rate is required) , the interpolation 

20 leading to a collection of frames whose mutual time 
shifts are less than N/2 samples. 

The interpolated frames can form the subject of a 
reduced transmission of coding parameters, as is 

25 described above, but this is not compulsory. This 
embodiment makes it possible to retain a relatively 
large interval M between two analysis frames, and hence 
to limit the transmission bit rate required, whilst 
limiting the discontinuities which are liable to appear 

30 by virtue of the size of this interval with respect to 
the typical timescales for the variations in the 
parameters of the audio signal, in particular the 
cepstral coefficients and the fundamental frequency. 

35 Figures 23 to 25 show other embodiments of the means 
employed to process the cepstral coefficients cx_sup 
delivered by the IFFT module 13 of figure 1, 
representing the upper envelope. 
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In the three cases, the post-lif tering module 15, 
normalizing module 16, quantization module 18 and 
module for calculating the spectral amplitudes 28 are 
5 essentially identical to those described previously 
with reference to figure 1. Furthermore, modules for 
post-lif tering 140, for smoothing 141 and for 
extracting the minimum phase 142 are provided so as to 
process the post-lif tered and quantized cepstral 
10 coefficients cx_sup_q delivered by the quantization 
module 18. These modules 140-142 operate essentially 
like the corresponding modules 55-57 of the decoder of 
figure 8 . 

15 In the embodiment shown in figure 23, the adaptation 
module 144 accomplishes a function similar to that of 
the module 29 of figure 1. However, the adaptation is 
not carried out solely on the basis of the modulus of 
the spectrum. The module 144 determines the best set of 

20 coefficients for the post-lifter 15 by minimizing the 
discrepancy between the spectrum of the audio signal, 
in terms of modulus |X| and phase (p x , and of the 
recalculated complex values for one or more of the 
harmonics of the fundamental frequency. The moduli of 

25 these latter complex values are given by the 
calculation module 28, and their phases correspond to 
the minimum phases cp(k) provided by the extraction 
module 142. To carry out the adaptation, the module 144 
can take into account any appropriate distance in the 

30 complex plane, for example the Euclidean distance. 

Thus, the adaptation of the post-lifter 15 by the 
module 144 takes account in a combined manner of 
frequency aspects of the signal, which are reflected by 
35 the modulus of the spectrum, and of temporal aspects, 
which are reflected by the phase of the spectrum. 

As represented dashed in figure 23, the post-lifter 140 
can also be adaptive, the adaptation performed by the 
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module 144 pertaining jointly to the two post-lifters 
15, 140. In this case, the post-lifter 55 of the 
decoder (figure 8) is adapted, like the post-lifter 
140, as a function of parameters iLif which the 
5 adaptation module 144 provides to the mutliplexer 6 so 
that it includes them in the digital stream <J>. 
Typically, a few sets of coefficients y lr y 2 are 
envisaged for the post-lifters 140 and 55, and the 
module 144 carries out an exhaustive test of these 
10 various sets of coefficients so as to retain the one 
which minimizes the discrepancy in the complex plane. 

In the example represented in figure 24, the adaptation 
module 29 for the post-lifter 15 is identical to that 

15 of figure 1. Figure 24 shows a module 145 for 
estimating a masking curve allowing the module 29 to 
select, for the minimization of the discrepancy in 
terms of modulus, the harmonic freguency or freguencies 
which most exceed the masking curve calculated on the 

20 basis of the modulus spectrum |X|, as described above. 

The post-lifter 140 of figure 24 is adapted separately 
by a module 14 6 which carries out the minimization of 
the discrepancies between the phase cp x of the spectrum 

25 of the signal and the minimum phase cp(k) calculated by 
the module 142 for one or more of the harmonics. Here, 
again, the harmonics selected for the calculation of 
the minimized phase discrepancy may be so as a function 
of the masking curve estimated by the module 145. The 

30 module 14 6 provides the output multiplexer 6 of the 
coder with the parameters iLif which represent the 
optimal post-lifter 140, so that they are used in the 
post-filter 55 of the decoder. 

35 In the example illustrated by figure 25, the post- 
lifter 140 serving in the calculation of the minimum 
phases is not adaptive. The minimum phases cp(k) 
calculated by the module 142 for the harmonics of the 
fundamental freguency are compared with the phases <p x 
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of the spectrum of the audio signal, and the phase 
discrepancy forms the subject of a quantization by a 
module 148. The corresponding quantization indices iAtp 
are provided by the module 148 to the output 
5 multiplexer 6 of the coder. 

In a decoder (figure 26) corresponding to a coder 
according to figure 25, a module 149 utilizes these 
quantization indices iAcp provided by the demultiplexer 

10 45 to obtain the values of the quantized phase 
discrepancies, which are added by an adder 150 to the 
minimum phases cp(k) calculated by the module 57 (the 
post-lifters 140 and 55 being identical) . The phases 
provided by the adder 150 are then used by the module 

15 54 which synthesizes the spectral lines of the harmonic 
component X v . 

The phase discrepancy quantized by the module 148, and 
which is used by the modules 149 and 150 of the decoder 
20 to correct the minimum phases cp(k), can be of two 
kinds : 

it can represent, for each frequency of index i 
corresponding to a harmonic of order k of the 
25 fundamental frequency F 0 , the difference between 

the phase (p x (i) of the spectrum of the signal at 
the frequency i and the minimum phase <p(k) 
calculated by the module 142 for harmonic k; 

30 - alternatively or cumulatively, this phase 
discrepancy can represent the variation of the 
phase cp x of the spectrum over the width of one or 
more spectral peaks corresponding to harmonics of 
the signal, this variation relating to the minimum 

35 phase <p(k) assigned to the peaks in question. 

In both cases, the peak or peaks for which the phase 
discrepancy is quantized may be chosen as a function of 
the spectral energy represented by the upper envelope, 
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which is available to the coder and to the decoder, 
thereby enabling the decoder to determine that spectral 
line to which the discrepancies should be applied. 

5 In the first case, the phase discrepancies may form the 
subject of a scalar quantization, or a vector 
quantization if they are grouped together for several 
peaks . 

10 In the second case, the variation of the phase cp x 
around the minimum phase cp(k) over the width of a 
harmonic peak (determined by the width of the reference 
line used by the module 54), can be represented simply 
by the slope of a linear segment selected as being that 

15 which exhibits a minimum quadratic distance with the 
curve of the variation in phase of the spectrum over 
the width of the line, and possibly by a shift at the 
origin . 

20 These slopes may form the subject of a scalar 
quantization, or a vector quantization if they are 
grouped together for several peaks. 

The quantization of the phase variations over the 
25 harmonic peaks may pertain to the collection of 
harmonic frequencies. Another possibility is to 
quantize several slopes each obtained by averaging the 
slopes at the harmonics over one or more subbands of 
the spectrum. This averaging can be weighted so as to 
30 take account of the energies relating to the various 
harmonic frequencies, represented by the upper 
envelope . 

The module 148 can also model the phase variation over 
35 the width of a peak by a more complex curve than a 
linear segment, for example a spline, whose parameters 
are quantized so as to be transmitted to the decoder. 
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Another possibility is to perform prior learning of 
phase models at the harmonics, representative of the 
phase variations over the width of the peaks, which 
variations are observed in a corpus of reference 
signals. These models are held in a dictionary stored 
by the modules 148 and 149. The module 148 of the coder 
determines the indices iAcp corresponding to the 
addresses of the models closest to the phase variations 
in the neighborhood of the harmonic peaks considered, 
and the module 14 9 of the decoder recovers these models 
for the synthesis of the phase of the harmonic 
component . 
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A method of coding an audio signal, in which a 
fundamental frequency (F 0 ) of the audio signal is 
estimated, a spectrum of the audio signal is 
determined through a transform in the frequency 
domain of a frame of the audio signal, and data 
for coding a harmonic component of the audio 
signal, comprising data representative of spectral 
amplitudes associated with frequencies which are 
multiples of the estimated fundamental frequency, 
are included in a digital output stream (O) , 
characterized in that said data representative of 
spectral amplitudes associated with frequencies 
which are multiples of the fundamental frequency 
(F 0 ) are obtained by means of cepstral 
coefficients calculated by transforming in the 
cepstral domain a compressed upper envelope 
(LX_sup) of the spectrum of the audio signal, and 
in that the data for coding the harmonic component 
furthermore comprise, for at least one of the 
frequencies which are multiples of the estimated 
fundamental frequency, data (iAcp; iLif) relating 
to the phase of the spectrum of the audio signal 
in the neighborhood of said multiple frequency. 

The method as claimed in claim 1, in which the 
compressed upper envelope (LX_sup) is determined 
by interpolation of said spectral amplitudes 
associated with the frequencies which are 
multiples of the fundamental frequency (F 0 ) with 
application of a spectral compression function. 

The method as claimed in claim 1 or 2, in which 
the cepstral coefficients are transformed by 
liftering and smoothing in the cepstral domain, 
and minimum phases ((p(k)) of the audio signal at 
frequencies which are multiples of the fundamental 
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frequency (F 0 ) are calculated on the basis of the 
transformed cepstral coefficients (cxI [n] ) . 

4. The method as claimed in claim 3, in which a phase 
discrepancy between the spectrum of the audio 
signal at at least one frequency which is a 
multiple of the fundamental frequency (F 0 ) and the 
corresponding calculated minimum phase (<p(k)) is 
quantized, and quantization parameters (iAtp) for 
said phase discrepancy are included in the data 
for coding the harmonic component. 

5. The method as claimed in claim 3, in which said 
liftering is adapted in such a way as to minimize 
a phase discrepancy between the spectrum of the 
audio signal at at least one frequency which is a 
multiple of the fundamental frequency (F 0 ) and the 
corresponding calculated minimum phase (<p(k)), and 
parameters (iLif) representative of the adapted 
liftering are included in the data for coding the 
harmonic component. 

6. The method as claimed in claim 5, in which the 
minimized phase discrepancy for the adaptation of 
the liftering relates to at least one frequency 
which is a multiple of the fundamental frequency 
(F 0 ) , selected on the basis of the magnitude of 
the modulus of the spectrum in absolute value. 

7. The method as claimed in claim 5, in which a curve 
of spectral masking of the audio signal is 
estimated by means of a psycho-acoustic model, and 
the minimized phase discrepancy for the adaptation 
of the liftering relates to at least one frequency 
which is a multiple of the fundamental frequency 
(F 0 ) , selected on the basis of the magnitude of 
the modulus of the spectrum in relation to the 
masking curve. 
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8. The method as claimed in any one of claims 3 to 7, 
in which the variations of the cepstral 
coefficients (cxp[n]) are examined so as to apply 
a smoothing of lesser magnitude in the presence of 
5 abrupt variations than in the presence of slow 

variations . 



9. The method as claimed in claim 8, in which the 
smoothing of the cepstral coefficients is 
performed by means of a forget factor (X c ) chosen 
as a function of a comparison between a threshold 
(d t h) and a distance (d) between two successive 
sets of cepstral coefficients (cxp[n-l], cxp[n]). 

10. The method as claimed in claim 9, in which said 
threshold (d th ) is adapted as a function of the 
variations of the cepstral coefficients (cxp[n]). 

11. The method as claimed in any one of claims 3 to 
10, in which the littering is of the form c p (i) = 
[1 + Y2 1 - y^.cli) - (nVi), where c p (i) and c(i) 
designate the cepstral coefficient of order i 
respectively before and after littering, and y 1 and 
j 2 are coefficients lying between 0 and 1 and p. is 
a pre-emphasizing coefficient. 

12. The method as claimed in claim 11, in which u = 
(72 - 7i) -c (1) . 



30 13. The method as claimed in any one of claims 1 to 
12, in which a variation of the phase of the 
spectrum of the audio signal is quantized over at 
least one interval of frequencies surrounding a 
frequency which is a multiple of the fundamental 

35 frequency (F 0 ) , and quantization parameters (iAcp) 

for said phase variation are included in the data 
for coding the harmonic component. 
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14. The method as claimed in claim 13, in which the 
quantized phase variation is represented by the 
slope of the phase of the spectrum at the 
frequency which is a multiple of the fundamental 
5 frequency (F 0 ) . 



The method as claimed in claim 13, in which the 
quantized phase variation represents a phase model 
stored in a dictionary determined by learning. 

An audio coder, comprising means for executing a 
method according to any one of claims 1 to 15. 

A method of decoding a digital input stream (O) 
representing a coded audio signal, in which data 
for coding a fundamental frequency (F 0 ) of the 
audio signal and data for coding a harmonic 
component comprising quantization data for 
cepstral coefficients representing an upper 
spectral envelope (X_sup) of the audio signal are 
received in the digital input stream, and a 
spectral estimate (X v ) of the harmonic component 
is generated, its modulus being determined by said 
upper spectral envelope at the frequencies which 
are multiples of the fundamental frequency, 
characterized in that, to determine the phase (<p v ) 
of the spectral estimate of the harmonic component 
(X v ) , the cepstral coefficients are transformed by 
littering and smoothing in the cepstral domain, 
and minimum phases (<p(k)) of the audio signal at 
frequencies which are multiples of the fundamental 
frequency are calculated on the basis of the 
transformed cepstral coefficients (cxI [n] ) . 



35 18. The method as claimed in claim 17, in which, to 
obtain the phase ((p v ) of the spectral estimate (X v ) 
of the harmonic component, at least one calculated 
minimum phase (<p(k)) is modified on the basis of 
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phase discrepancy quantization data (iA(p) included 
in the data for coding the harmonic component. 

18. The method as claimed in claim 18, in which the 
5 phase discrepancy quantization data (iA<p) are used 

to determine a variation of the phase ((p v ) of the 
spectral estimate (X v ) of the harmonic component 
over at least one interval of frequencies 
surrounding a frequency which is a multiple of the 
10 fundamental frequency (F 0 ) . 



20. The method as claimed in claim 19, in which the 
phase variation determined over an interval of 
frequencies is a linear variation represented by a 
15 slope quantization parameter. 



21. The method as claimed in claim 19, in which the 
phase variation determined over an interval of 
frequencies represents a phase model stored in a 
20 dictionary determined by learning. 



22. The method as claimed in any one of claims 17 to 

21, in which said liftering is adapted as a 
function of parameters (iLif) included in the data 

25 for coding the harmonic component. 

23 . The method as claimed in any one of claims 17 to 

22, in which the variations of the cepstral 
coefficients (cxp[n]) are examined so as to apply 

30 a smoothing of lesser magnitude in the presence of 

abrupt variations than in the presence of slow 
variations . 



24. The method as claimed in claim 23, in which the 
smoothing of the cepstral coefficients (cxp[n]) is 
performed by means of a forget factor (A, c ) chosen 
as a function of a comparison between a threshold 
(d t h) and a distance (d) between two successive 
sets of cepstral coefficients (cxp[n-l], cxp[n]). 
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25. The method as claimed in claim 24, in which said 
threshold (d th ) is adapted as a function of the 
variations of the cepstral coefficients (cxp[n])„ 

26. A method of decoding a digital input stream (O) 
representing a coded audio signal, in which data 
for coding a fundamental frequency (F 0 ) of the 
audio signal and data for coding a harmonic 
component comprising quantization data for 
cepstral coefficients representing an upper 
spectral envelope (X_sup) of the audio signal are 
received in the digital input stream, and a 
spectral estimate (X v ) of the harmonic component 
is generated, its modulus being determined by said 
upper spectral envelope at the frequencies which 
are multiples of the fundamental frequency, 
characterized in that a phase variation (cp v ) of 
the spectral estimate of the harmonic component is 
determined over at least one interval of 
frequencies surrounding a frequency which is a 
multiple of the fundamental frequency on the basis 
of quantization parameters (iAcp) included in the 
data for coding the harmonic component. 

27. The method as claimed in claim 26, in which the 
phase variation determined over an interval of 
frequencies is a linear variation represented by a 
slope quantization parameter. 

28. The method as claimed in claim 26, in which the 
phase variation determined over an interval of 
frequencies represents a phase model stored in a 
dictionary determined by learning. 

29. An audio decoder, comprising means for executing a 
method according to any one of claims 17 to 28. 
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